Files changed (2) hide show
  1. README.md +87 -136
  2. tokenizer_config.json +1 -1
README.md CHANGED
@@ -7,21 +7,16 @@ language:
7
  tags:
8
  - falcon3
9
  base_model: tiiuae/Falcon3-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
  license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
  ---
15
 
16
- <div align="center">
17
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
- </div>
19
-
20
  # Falcon3-7B-Instruct
21
 
22
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
23
 
24
- This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks.
25
  Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
26
 
27
  ## Model Details
@@ -34,7 +29,7 @@ Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese)
34
  - Uses SwiGLU and RMSNorm
35
  - 32K context length
36
  - 131K vocab size
37
- - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
38
  - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
39
  - Supports EN, FR, ES, PT
40
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
@@ -91,66 +86,7 @@ print(response)
91
  <br>
92
 
93
  ## Benchmarks
94
- We report the official HuggingFace leaderboard normalized evaluations [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) in the following table.
95
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
96
- <colgroup>
97
- <col style="width: 10%;">
98
- <col style="width: 7%;">
99
- <col style="width: 7%;">
100
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
101
- </colgroup>
102
- <thead>
103
- <tr>
104
- <th>Benchmark</th>
105
- <th>Llama-3.1-8B-Instruct</th>
106
- <th>Qwen2.5-7B-Instruct</th>
107
- <th>Falcon3-7B-Instruct</th>
108
- </tr>
109
- </thead>
110
- <tbody>
111
- <tr>
112
- <td>IFEval</td>
113
- <td><b>78.56</b></td>
114
- <td>75.85</td>
115
- <td>76.12</td>
116
- </tr>
117
- <tr>
118
- <td>BBH (3-shot)</td>
119
- <td>29.89</td>
120
- <td>34.89</td>
121
- <td><b>37.92</b></td>
122
- </tr>
123
- <tr>
124
- <td>MATH Lvl-5 (4-shot)</td>
125
- <td>19.34</td>
126
- <td>0.00</td>
127
- <td><b>31.87</b></td>
128
- </tr>
129
- <tr>
130
- <td>GPQA (0-shot)</td>
131
- <td>2.35</td>
132
- <td>5.48</td>
133
- <td><b>8.05</b></td>
134
- </tr>
135
- <tr>
136
- <td>MUSR (0-shot)</td>
137
- <td>8.41</td>
138
- <td>8.45</td>
139
- <td><b>21.17</b></td>
140
- </tr>
141
- <tr>
142
- <td>MMLU-PRO (5-shot)</td>
143
- <td>30.68</td>
144
- <td><b>36.52</b></td>
145
- <td>34.30</td>
146
- </tr>
147
- </tbody>
148
- </table>
149
-
150
- Also, we report in the following table our internal pipeline benchmarks.
151
- - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
152
- - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
153
- - We use same batch-size across all models.
154
 
155
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
156
  <colgroup>
@@ -158,139 +94,154 @@ Also, we report in the following table our internal pipeline benchmarks.
158
  <col style="width: 10%;">
159
  <col style="width: 7%;">
160
  <col style="width: 7%;">
 
161
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
162
  </colgroup>
163
  <thead>
164
  <tr>
165
  <th>Category</th>
166
  <th>Benchmark</th>
167
- <th>Llama-3.1-8B-Instruct</th>
168
- <th>Qwen2.5-7B-Instruct</th>
169
- <th>Falcon3-7B-Instruct</th>
 
170
  </tr>
171
  </thead>
172
  <tbody>
173
  <tr>
174
  <td rowspan="3">General</td>
175
  <td>MMLU (5-shot)</td>
176
- <td>68.2</td>
177
- <td><b>73.5</b></td>
178
- <td>70.5</td>
 
179
  </tr>
180
  <tr>
181
  <td>MMLU-PRO (5-shot)</td>
182
- <td>36.4</td>
183
- <td><b>43.1</b></td>
184
- <td>40.7</td>
 
185
  </tr>
186
  <tr>
187
  <td>IFEval</td>
188
- <td><b>78.8</b></td>
189
- <td>74.7</td>
190
- <td>76.5</td>
 
191
  </tr>
192
  <tr>
193
  <td rowspan="3">Math</td>
194
  <td>GSM8K (5-shot)</td>
195
- <td><b>82.6</b></td>
196
- <td>72.0</td>
197
- <td>81.4</td>
 
198
  </tr>
199
  <tr>
200
  <td>GSM8K (8-shot, COT)</td>
201
- <td><b>85.4</b></td>
202
- <td>76.6</td>
203
- <td>79.7</td>
 
204
  </tr>
205
  <tr>
206
  <td>MATH Lvl-5 (4-shot)</td>
207
- <td>15.4</td>
208
- <td>-</td>
209
- <td><b>29.4</b></td>
 
210
  </tr>
211
  <tr>
212
- <td rowspan="5">Reasoning</td>
213
  <td>Arc Challenge (25-shot)</td>
214
- <td>58.6</td>
215
- <td>57.8</td>
216
- <td><b>62.6</b></td>
 
217
  </tr>
218
  <tr>
219
  <td>GPQA (0-shot)</td>
220
- <td><b>33.5</b></td>
221
- <td>32</td>
222
- <td>31.9</td>
 
223
  </tr>
224
  <tr>
225
  <td>GPQA (0-shot, COT)</td>
226
- <td>9.6</td>
227
- <td>13.8</td>
228
- <td><b>22.3</b></td>
 
229
  </tr>
230
  <tr>
231
  <td>MUSR (0-shot)</td>
232
- <td>38.6</td>
233
- <td>41</td>
234
- <td><b>46.4</b></td>
 
235
  </tr>
236
  <tr>
237
  <td>BBH (3-shot)</td>
238
- <td>48.6</td>
239
- <td><b>54.1</b></td>
240
- <td>52.4</td>
 
 
 
 
 
 
 
 
241
  </tr>
242
  <tr>
243
- <td rowspan="4">CommonSense Understanding</td>
244
  <td>PIQA (0-shot)</td>
245
- <td><b>78.9</b></td>
246
- <td>73.7</td>
247
- <td>78.8</td>
 
248
  </tr>
249
  <tr>
250
  <td>SciQ (0-shot)</td>
251
- <td>80.2</td>
252
- <td>50.9</td>
253
- <td><b>94.7</b></td>
 
254
  </tr>
255
  <tr>
256
  <td>Winogrande (0-shot)</td>
257
  <td>-</td>
258
  <td>-</td>
259
- <td>70.4</td>
 
260
  </tr>
261
  <tr>
262
  <td>OpenbookQA (0-shot)</td>
263
- <td><b>46.2</b></td>
264
- <td>42.4</td>
265
- <td>45.8</td>
 
266
  </tr>
267
  <tr>
268
- <td rowspan="2">Instructions following</td>
269
  <td>MT-Bench (avg)</td>
270
- <td>7.9</td>
271
- <td><b>8.5</b></td>
272
- <td>8.4</td>
 
273
  </tr>
274
  <tr>
275
- <td>Alpaca (WC)</td>
276
- <td>26.6</td>
277
- <td><b>31.5</b></td>
278
- <td>26.1</td>
279
- </tr>
280
- <tr>
281
- <td>Tool use</td>
282
- <td>BFCL AST (avg)</td>
283
- <td>90.6</td>
284
- <td><b>91.4</b></td>
285
- <td>89.5</td>
286
  </tr>
287
  </tbody>
288
  </table>
289
 
290
- ## Useful links
291
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
292
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
293
-
294
  ## Technical Report
295
  Coming soon....
296
 
 
7
  tags:
8
  - falcon3
9
  base_model: tiiuae/Falcon3-7B-Base
10
+ license: other
11
+ license_name: falcon-llm-license
12
  license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
 
13
  ---
14
 
 
 
 
 
15
  # Falcon3-7B-Instruct
16
 
17
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
18
 
19
+ This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
  Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
21
 
22
  ## Model Details
 
29
  - Uses SwiGLU and RMSNorm
30
  - 32K context length
31
  - 131K vocab size
32
+ - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
33
  - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
 
86
  <br>
87
 
88
  ## Benchmarks
89
+ We report in the following table our internal pipeline benchmarks:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
92
  <colgroup>
 
94
  <col style="width: 10%;">
95
  <col style="width: 7%;">
96
  <col style="width: 7%;">
97
+ <col style="width: 7%;">
98
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
99
  </colgroup>
100
  <thead>
101
  <tr>
102
  <th>Category</th>
103
  <th>Benchmark</th>
104
+ <th>Llama-3.2-1B</th>
105
+ <th>Qwen2.5-1.5B</th>
106
+ <th>SmolLM2-1.7B</th>
107
+ <th>Falcon3-1B-Instruct</th>
108
  </tr>
109
  </thead>
110
  <tbody>
111
  <tr>
112
  <td rowspan="3">General</td>
113
  <td>MMLU (5-shot)</td>
114
+ <td>23.4</td>
115
+ <td><b>58.4</b></td>
116
+ <td>48.4</td>
117
+ <td>43.9</td>
118
  </tr>
119
  <tr>
120
  <td>MMLU-PRO (5-shot)</td>
121
+ <td>11.3</td>
122
+ <td><b>21.3</b></td>
123
+ <td>17.2</td>
124
+ <td>18.6</td>
125
  </tr>
126
  <tr>
127
  <td>IFEval</td>
128
+ <td><b>55.8</b></td>
129
+ <td>44.4</td>
130
+ <td>53.0</td>
131
+ <td>54.4</td>
132
  </tr>
133
  <tr>
134
  <td rowspan="3">Math</td>
135
  <td>GSM8K (5-shot)</td>
136
+ <td>37.4</td>
137
+ <td><b>57.2</b></td>
138
+ <td>43.4</td>
139
+ <td>38.6</td>
140
  </tr>
141
  <tr>
142
  <td>GSM8K (8-shot, COT)</td>
143
+ <td>35.6</td>
144
+ <td><b>62.2</b></td>
145
+ <td>47.2</td>
146
+ <td>41.8</td>
147
  </tr>
148
  <tr>
149
  <td>MATH Lvl-5 (4-shot)</td>
150
+ <td><b>3.9</b></td>
151
+ <td>0.2</td>
152
+ <td>0.1</td>
153
+ <td>1.0</td>
154
  </tr>
155
  <tr>
156
+ <td rowspan="6">Reasoning</td>
157
  <td>Arc Challenge (25-shot)</td>
158
+ <td>34.1</td>
159
+ <td>47.0</td>
160
+ <td><b>47.6</b></td>
161
+ <td>45.9</td>
162
  </tr>
163
  <tr>
164
  <td>GPQA (0-shot)</td>
165
+ <td>25.3</td>
166
+ <td><b>29.6</b></td>
167
+ <td>28.7</td>
168
+ <td>26.5</td>
169
  </tr>
170
  <tr>
171
  <td>GPQA (0-shot, COT)</td>
172
+ <td>13.2</td>
173
+ <td>9.2</td>
174
+ <td>16.0</td>
175
+ <td><b>21.3</b></td>
176
  </tr>
177
  <tr>
178
  <td>MUSR (0-shot)</td>
179
+ <td>32.4</td>
180
+ <td>36.8</td>
181
+ <td>33.0</td>
182
+ <td><b>40.7</b></td>
183
  </tr>
184
  <tr>
185
  <td>BBH (3-shot)</td>
186
+ <td>30.3</td>
187
+ <td><b>38.5</b></td>
188
+ <td>33.1</td>
189
+ <td>35.1</td>
190
+ </tr>
191
+ <tr>
192
+ <td>BBH (3-shot, COT)</td>
193
+ <td>0.0</td>
194
+ <td>20.3</td>
195
+ <td>0.8</td>
196
+ <td><b>30.5</b></td>
197
  </tr>
198
  <tr>
199
+ <td rowspan="5">CommonSense Understanding</td>
200
  <td>PIQA (0-shot)</td>
201
+ <td>72.1</td>
202
+ <td>73.2</td>
203
+ <td><b>74.4</b></td>
204
+ <td>72.0</td>
205
  </tr>
206
  <tr>
207
  <td>SciQ (0-shot)</td>
208
+ <td>61.8</td>
209
+ <td>69.5</td>
210
+ <td>71.4</td>
211
+ <td><b>86.8</b></td>
212
  </tr>
213
  <tr>
214
  <td>Winogrande (0-shot)</td>
215
  <td>-</td>
216
  <td>-</td>
217
+ <td>-</td>
218
+ <td><b>60.2</b></td>
219
  </tr>
220
  <tr>
221
  <td>OpenbookQA (0-shot)</td>
222
+ <td>40.2</td>
223
+ <td>40.4</td>
224
+ <td><b>42.8</b></td>
225
+ <td>40.0</td>
226
  </tr>
227
  <tr>
 
228
  <td>MT-Bench (avg)</td>
229
+ <td>5.4</td>
230
+ <td><b>7.1</b></td>
231
+ <td>6.1</td>
232
+ <td>5.5</td>
233
  </tr>
234
  <tr>
235
+ <td rowspan="1">Instructions following</td>
236
+ <td>Alapaca (WC)</td>
237
+ <td><b>8.6</b></td>
238
+ <td><b>8.6</b></td>
239
+ <td>5.4</td>
240
+ <td>6.1</td>
 
 
 
 
 
241
  </tr>
242
  </tbody>
243
  </table>
244
 
 
 
 
 
245
  ## Technical Report
246
  Coming soon....
247
 
tokenizer_config.json CHANGED
@@ -16219,7 +16219,7 @@
16219
  ">>PASSWORD<<",
16220
  ">>KEY<<"
16221
  ],
16222
- "chat_template": "{%- if tools %}\n{{- '<|system|>\\n' }}\n{%- if messages[0]['role'] == 'system' %}\n{{- messages[0]['content'] }}\n{%- set remaining_messages = messages[1:] %}\n{%- else %}\n{%- set remaining_messages = messages %}\n{%- endif %}\n{{- 'You are a Falcon assistant skilled in function calling. You are helpful, respectful, and concise.\\n\\n# Tools\\n\\nYou have access to the following functions. You MUST use them to answer questions when needed. For each function call, you MUST return a JSON object inside <tool_call></tool_call> tags.\\n\\n<tools>' + tools|tojson(indent=2) + '</tools>\\n\\n# Output Format\\n\\nYour response MUST follow this format when making function calls:\\n<tool_call>\\n[\\n {\"name\": \"function_name\", \"arguments\": {\"arg1\": \"value1\", \"arg2\": \"value2\"}},\\n {\"name\": \"another_function\", \"arguments\": {\"arg\": \"value\"}}\\n]\\n</tool_call>\\nIf no function calls are needed, respond normally without the tool_call tags.\\n' }}\n{%- for message in remaining_messages %}\n{%- if message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if message.content %}\n{{- '<|assistant|>\\n' + message['content'] }}\n{%- endif %}\n{%- if message.tool_calls %}\n{{- '\\n<tool_call>\\n' }}\n{{- message.tool_calls|tojson(indent=2) }}\n{{- '\\n</tool_call>' }}\n{%- endif %}\n{{- eos_token + '\\n' }}\n{%- elif message['role'] == 'tool' %}\n{{- '<|assistant|>\\n<tool_response>\\n' + message['content'] + '\\n</tool_response>\\n' }}\n{%- endif %}\n{%- endfor %}\n{{- '<|assistant|>\\n' if add_generation_prompt }}\n{%- else %}\n{%- for message in messages %}\n{%- if message['role'] == 'system' %}\n{{- '<|system|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if not loop.last %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token + '\\n' }}\n{%- else %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token }}\n{%- endif %}\n{%- endif %}\n{%- if loop.last and add_generation_prompt %}\n{{- '<|assistant|>\\n' }}\n{%- endif %}\n{%- endfor %}\n{%- endif %}",
16223
  "clean_up_tokenization_spaces": true,
16224
  "eos_token": "<|endoftext|>",
16225
  "extra_special_tokens": {},
 
16219
  ">>PASSWORD<<",
16220
  ">>KEY<<"
16221
  ],
16222
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}",
16223
  "clean_up_tokenization_spaces": true,
16224
  "eos_token": "<|endoftext|>",
16225
  "extra_special_tokens": {},