File size: 8,935 Bytes
e2858b5
 
c863a93
 
 
 
5d779ad
 
e2858b5
816e24e
aff4c18
 
 
 
 
 
 
 
 
816e24e
c863a93
 
404abbd
 
 
 
5d779ad
c863a93
 
 
2f85882
c863a93
 
 
64b3807
d0475fe
c863a93
5d779ad
c863a93
 
 
5d779ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e561311
5d779ad
 
 
 
 
 
 
 
 
 
 
 
 
c863a93
72360ff
c863a93
5d779ad
 
 
 
 
 
 
c863a93
5d779ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c863a93
 
 
 
 
 
5d779ad
72360ff
 
 
 
 
5d779ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2abc059
 
5d059dc
816e24e
 
f43c596
816e24e
5d059dc
 
aff4c18
c13aa1e
 
 
 
 
 
 
 
 
 
 
 
aff4c18
5d059dc
2abc059
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
license: apache-2.0
language:
- en
tags:
- Mathematical Reasoning
datasets:
- akjindal53244/Arithmo-Data
---

## [January 2024] New Model Release: Arithmo2-Mistral-7B

**Arithmo2-Mistral-7B** model improves initially released Arithmo-Mistral-7B model on both GSM8K and MATH benchmarks. Specifically, there is **absolute** improvement of:
- +1.7% on GSM8K
- +3.0% on GSM8K PoT
- +1.9% on MATH

<b>Note</b>: <span style="color:red"><b>It is recommended to use Arithmo2-Mistral-7B model</b></span>. Here is the [merged model](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) and corresponding [LoRA Adapter](https://huggingface.co/upaya07/Arithmo2-Mistral-7B-adapter).


# Model Card for Model ID

[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](CODE_LICENSE)
[![Model Weight License](https://img.shields.io/badge/Model%20Weights%20License-Apache_2.0-green.svg)](LICENSE)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)

**P.S.:** Please reach out to [Ashvini Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/) if you would be interested in supporting compute need. We are looking for small-scale support so we'd appreciate any kind of help! :)

## Model Details

Arithmo-Mistral-7B is trained to reason and answer mathematical problems and is also capable of writing a Python program that upon execution prints answer to the question. We used [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) as a base model and used **QLoRA to fine-tune it on a single RTX 4090 GPU**.

### Model Description

- **Project GitHub Page:** https://github.com/akjindal53244/Arithmo-Mistral-7B
- **Developed by:** [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/)
- **Funded by:** self-work
- **Model type:** fine-tuned
- **Language(s) (NLP):** English
- **Finetuned from model:** mistralai/Mistral-7B-v0.1

## Results

Arithmo-Mistral-7B outperforms existing 7B and 13B state-of-the-art Mathematical Reasoning models. Refer to [Comparing Arithmo-Mistral-7B with other LLM models](https://github.com/akjindal53244/Arithmo-Mistral-7B/tree/master#comparing-arithmo-mistral-7b-with-other-llm-models) section for more details.

<table>
    <thead>
        <tr>
            <th>Prompt Approach</th>
            <th>GSM8k</th>
            <th>MATH</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Zero-Shot CoT</td>
            <td><b>74.7</b></td>
            <td><b>25.3</b></td>
        </tr>
        <tr>
            <td>Zero-Shot PoT</td>
            <td><b>71.2</b></td>
            <td>-</td>
        </tr>
    </tbody>
</table>

- **Zero-Shot CoT**: On providing a question as prompt, model generates reasoning steps to solve the question along with answer. We check if answer matches with ground-truth.
- **Zero-Shot PoT**: We prompt the model to generate a Python program for the given question. During inference, we execute the Python program generated by the model and check if the program output matches with ground-truth answer.


## Installation

```
pip install transformers >=4.34.0
pip install accelerate
pip install sentencepiece
pip install protobuf

# If you are GPU poor like me
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# If you have a GPU.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
pip install scipy
pip install bitsandbytes
```


## How to query the model

```
# Set `run_model_on_gpu` to `False` if you are running on CPU. Model will generate reasoning steps with answer for your question. If you want to generate Python program, uncomment line-69 that adds a Python prompt.
# This script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc.**

$ python query_model.py
```
**Note:** Above script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc. Checkout [query_model.py](https://github.com/akjindal53244/Arithmo-Mistral-7B/blob/master/query_model.py) for more details. <br><br>

##### Sample Input:
```
Question: There are total 10 children. I have to give 1 apple to first child, 2 apples to second child, 3 apples to third child, and so on. How many apples do I need?
```
##### Model Output:
```
Answer: The total number of apples needed is the sum of the first 10 positive integers.
This can be calculated using the formula for the sum of an arithmetic series:
\[S = \frac{n}{2}(a_1 + a_n),\]
where $S$ is the sum, $n$ is the number of terms, $a_1$ is the first term, and $a_n$ is the last term.
In this case, $n = 10$, $a_1 = 1$, and $a_n = 10$.
Plugging these values into the formula, we get:
\[S = \frac{10}{2}(1 + 10) = 5(11) = \boxed{55}.\]
The answer is: 55
```

Arithmo-Mistral-7B is trained with the following format:
#### CoT Format (generate reasoning steps with answer):
```
Question: <question>

Answer:
```

#### PoT Format (generate a python program):
```
Question: <question> <python_prompt>

Answer:
```
It will perform best if queried in this way with your own script.

## Comparing Arithmo-Mistral-7B with other LLM models.
Results for all models except `Arithmo-Mistral-7B` are taken from [MetaMath](https://github.com/meta-math/MetaMath/blob/main/README.MD) repository.

| Model               | GSM8k Pass@1 | MATH Pass@1 |
|---------------------|--------------|-------------|
| MPT-7B              | 6.8          | 3.0         |
| Falcon-7B           | 6.8          | 2.3         |
| LLaMA-1-7B          | 11.0         | 2.9         |
| LLaMA-2-7B          | 14.6         | 2.5         |
| MPT-30B             | 15.2         | 3.1         |
| LLaMA-1-13B         | 17.8         | 3.9         |
| GPT-Neo-2.7B        | 19.5         | --          |
| Falcon-40B          | 19.6         | 2.5         |
| Baichuan-chat-13B   | 23.9         | --          |
| Vicuna-v1.3-13B     | 27.6         | --          |
| LLaMA-2-13B         | 28.7         | 3.9         |
| InternLM-7B         | 31.2         | --          |
| ChatGLM-2-6B        | 32.4         | --          |
| GPT-J-6B            | 34.9         | --          |
| LLaMA-1-33B         | 35.6         | 3.9         |
| LLaMA-2-34B         | 42.2         | 6.24        |
| RFT-7B              | 50.3         | --          |
| LLaMA-1-65B         | 50.9         | 10.6        |
| Qwen-7B             | 51.6         | --          |
| WizardMath-7B       | 54.9         | 10.7        |
| LLaMA-2-70B         | 56.8         | 13.5        |
| WizardMath-13B      | 63.9         | 14.0        |
| MetaMath-7B         | 66.5         | 19.8        |
| MetaMath-13B        | 72.3         | 22.4        |
| 🔥 **Arithmo-Mistral-7B Zero-Shot PoT**  | **71.2** | --       |
| 🔥 **Arithmo-Mistral-7B Zero-Shot CoT**  | **74.7** | **25.3**       |
| WizardMath-70B      | **81.6**     | 22.7        |
| MetaMath-70B        | **82.3**     | **26.6**        |


If you are interested in reproducing the resullts, visit https://github.com/akjindal53244/Arithmo-Mistral-7B#reproducing-results section.


### Support My Work

Building LLMs takes time and resources; if you find my work interesting, your support would be epic!
<a href="https://www.buymeacoffee.com/a_little_learner" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>


### Citation
To cite Arithmo models:
```
@misc{jindal_2023_arithmo,
  author = {Jindal, Ashvini},
  title = {Arithmo-Mistral-7B: Mathematical Reasoning Model},
  howpublished = {Hugging Face},
  month = {October},
  year = {2023},
  url = {https://huggingface.co/akjindal53244/Arithmo-Mistral-7B}
}
```



<h2 id="References">References</h2>

```
@article{yu2023metamath,
  title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
  author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
  journal={arXiv preprint arXiv:2309.12284},
  year={2023}
}

@article{Yue2023mammoth,
  title={MAmmoTH: Building math generalist models through hybrid instruction tuning},
  author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen},
  journal={arXiv preprint arXiv:2309.05653},
  year={2023}
}

@article{mishra2022lila,
  title={Lila: A unified benchmark for mathematical reasoning},
  author={Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, and Ashwin Kalyan},
  journal={arXiv preprint arXiv:2210.17517},
  year={2022}
}

```