Supervised Finetuning the phi1.5 on MetaMathQA datasets. The results are as follows:

Model GSM8k Pass@1 MATH Pass@1
MPT-7B 6.8 3.0
Falcon-7B 6.8 2.3
LLaMA-1-7B 11.0 2.9
LLaMA-2-7B 14.6 2.5
MPT-30B 15.2 3.1
LLaMA-1-13B 17.8 3.9
GPT-Neo-2.7B 19.5 --
Falcon-40B 19.6 2.5
Baichuan-chat-13B 23.9 --
Vicuna-v1.3-13B 27.6 --
LLaMA-2-13B 28.7 3.9
InternLM-7B 31.2 --
ChatGLM-2-6B 32.4 --
GPT-J-6B 34.9 --
LLaMA-1-33B 35.6 3.9
LLaMA-2-34B 42.2 6.24
RFT-7B 50.3 --
LLaMA-1-65B 50.9 10.6
Qwen-7B 51.6 --
Phi1.5-1.3B 54.3 15.5
WizardMath-7B 54.9 10.7
LLaMA-2-70B 56.8 13.5
WizardMath-13B 63.9 14.0
MAmmoTH-7B (COT) 50.5 10.4
MAmmoTH-7B (POT+COT) 53.6 31.5
Arithmo-Mistral-7B 74.7 25.3
MetaMath-7B 66.5 19.8
MetaMath-13B 72.3 22.4
MetaMath-Mistral-7B 77.7 28.2

It achieves remarkable performance with only 1.3B parameters !!!

You can evaluate the results by metamath evaluation code.

Downloads last month
19
Safetensors
Model size
1.41B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train klein-zcy/Phi-1_5-MetaMathQA