Update README.md
Browse files
README.md
CHANGED
@@ -14,12 +14,11 @@ base_model:
|
|
14 |
|
15 |
This is a continual-pre-training of [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a mix of 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) (our new high quality math dataset) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
|
16 |
|
17 |
-
The model demonstrates superior math performance compared to Llama 3.2 3B, while
|
18 |
|
19 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png)
|
20 |
|
21 |
-
It was trained on **160B tokens** using a mix of 40% FineWeb-Edu and 30% FineMath-4+ and 30% InfiWebMath-4+
|
22 |
-
|
23 |
|
24 |
## Use
|
25 |
|
|
|
14 |
|
15 |
This is a continual-pre-training of [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a mix of 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) (our new high quality math dataset) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
|
16 |
|
17 |
+
The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks:
|
18 |
|
19 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png)
|
20 |
|
21 |
+
It was trained on **160B tokens** using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use [nanotron](https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining) for the training, and you can find the training scripts in our [SmolLM2 GitHub repo](https://github.com/huggingface/smollm).
|
|
|
22 |
|
23 |
## Use
|
24 |
|