Update README.md
Browse files
README.md
CHANGED
@@ -12,18 +12,20 @@ base_model:
|
|
12 |
|
13 |
## Model summary
|
14 |
|
15 |
-
This
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
-
- **License**: Apache-2
|
19 |
-
- **Languages**: English
|
20 |
|
21 |
## Use
|
22 |
|
23 |
### Intended use
|
24 |
|
25 |
-
This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English
|
26 |
-
It is important to note that the primary intended use case of this model is to compare its performance with other models trained under the same conditions. This model is not necessarily the best possible outcome achievable with the given dataset.
|
27 |
|
28 |
### Generation
|
29 |
|
@@ -31,7 +33,7 @@ It is important to note that the primary intended use case of this model is to c
|
|
31 |
# pip install -q transformers
|
32 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
33 |
|
34 |
-
model = "HuggingFaceTB/
|
35 |
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
36 |
|
37 |
tokenizer = AutoTokenizer.from_pretrained(model)
|
@@ -48,20 +50,20 @@ We are releasing intermediate checkpoints for this model at intervals of every 1
|
|
48 |
|
49 |
You can load a specific model revision with `transformers` using the argument `revision`:
|
50 |
```python
|
51 |
-
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/
|
52 |
```
|
53 |
You can access all the revisions for the models via the following code:
|
54 |
```python
|
55 |
from huggingface_hub import list_repo_refs
|
56 |
-
out = list_repo_refs("HuggingFaceTB/
|
57 |
print([b.name for b in out.branches])
|
58 |
```
|
59 |
|
60 |
## Training
|
61 |
### Model
|
62 |
- **Architecture**: Llama3
|
63 |
-
- **Pretraining steps**:
|
64 |
-
- **Pretraining tokens**:
|
65 |
- **Precision**: bfloat16
|
66 |
|
67 |
### Hardware
|
|
|
12 |
|
13 |
## Model summary
|
14 |
|
15 |
+
This is a continual-pre-training of [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a mix of 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) (our new high quality math dataset) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
|
16 |
+
|
17 |
+
The model demonstrates superior math performance compared to Llama 3.2 3B, while having similar performance on Knowledge, reasoning and Common sense benchmarks:
|
18 |
+
|
19 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png)
|
20 |
+
|
21 |
+
It was trained on **160B tokens** using a mix of 40% FineWeb-Edu and 30% FineMath-4+ and 30% InfiWebMath-4+ from FineMath. We use [nanotron](https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining) for the training. You can find the training scripts in our [SmolLM2 GitHub repo](https://github.com/huggingface/smollm).
|
22 |
|
|
|
|
|
23 |
|
24 |
## Use
|
25 |
|
26 |
### Intended use
|
27 |
|
28 |
+
This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English.
|
|
|
29 |
|
30 |
### Generation
|
31 |
|
|
|
33 |
# pip install -q transformers
|
34 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
35 |
|
36 |
+
model = "HuggingFaceTB/FineMath-Llama-3B"
|
37 |
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
38 |
|
39 |
tokenizer = AutoTokenizer.from_pretrained(model)
|
|
|
50 |
|
51 |
You can load a specific model revision with `transformers` using the argument `revision`:
|
52 |
```python
|
53 |
+
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/FineMath-Llama-3B", revision="10B")
|
54 |
```
|
55 |
You can access all the revisions for the models via the following code:
|
56 |
```python
|
57 |
from huggingface_hub import list_repo_refs
|
58 |
+
out = list_repo_refs("HuggingFaceTB/FineMath-Llama-3B")
|
59 |
print([b.name for b in out.branches])
|
60 |
```
|
61 |
|
62 |
## Training
|
63 |
### Model
|
64 |
- **Architecture**: Llama3
|
65 |
+
- **Pretraining steps**: 160k
|
66 |
+
- **Pretraining tokens**: 160B
|
67 |
- **Precision**: bfloat16
|
68 |
|
69 |
### Hardware
|