Safetensors
English
llama
loubnabnl HF staff commited on
Commit
5b03bc3
·
verified ·
1 Parent(s): fe526b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -11
README.md CHANGED
@@ -12,18 +12,20 @@ base_model:
12
 
13
  ## Model summary
14
 
15
- This model is part of the 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) ablations, we continue pretraining [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) base on different math datasets for 60B tokens.
16
- The model has 3.21B parameters and 4096 context length. It was trained on **160B tokens** using a mix of 40% [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and 30% FineMath-4+ and 30% InfiWebMath-4+ from the 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) dataset.
 
 
 
 
 
17
 
18
- - **License**: Apache-2
19
- - **Languages**: English
20
 
21
  ## Use
22
 
23
  ### Intended use
24
 
25
- This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English with a focus on math.
26
- It is important to note that the primary intended use case of this model is to compare its performance with other models trained under the same conditions. This model is not necessarily the best possible outcome achievable with the given dataset.
27
 
28
  ### Generation
29
 
@@ -31,7 +33,7 @@ It is important to note that the primary intended use case of this model is to c
31
  # pip install -q transformers
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
 
34
- model = "HuggingFaceTB/finemath-ablation-4plus-160B"
35
  device = "cuda" # for GPU usage or "cpu" for CPU usage
36
 
37
  tokenizer = AutoTokenizer.from_pretrained(model)
@@ -48,20 +50,20 @@ We are releasing intermediate checkpoints for this model at intervals of every 1
48
 
49
  You can load a specific model revision with `transformers` using the argument `revision`:
50
  ```python
51
- model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/finemath-ablation-4plus-160B", revision="10B")
52
  ```
53
  You can access all the revisions for the models via the following code:
54
  ```python
55
  from huggingface_hub import list_repo_refs
56
- out = list_repo_refs("HuggingFaceTB/finemath-ablation-4plus-160B")
57
  print([b.name for b in out.branches])
58
  ```
59
 
60
  ## Training
61
  ### Model
62
  - **Architecture**: Llama3
63
- - **Pretraining steps**: 60k
64
- - **Pretraining tokens**: 60B
65
  - **Precision**: bfloat16
66
 
67
  ### Hardware
 
12
 
13
  ## Model summary
14
 
15
+ This is a continual-pre-training of [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a mix of 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) (our new high quality math dataset) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
16
+
17
+ The model demonstrates superior math performance compared to Llama 3.2 3B, while having similar performance on Knowledge, reasoning and Common sense benchmarks:
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png)
20
+
21
+ It was trained on **160B tokens** using a mix of 40% FineWeb-Edu and 30% FineMath-4+ and 30% InfiWebMath-4+ from FineMath. We use [nanotron](https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining) for the training. You can find the training scripts in our [SmolLM2 GitHub repo](https://github.com/huggingface/smollm).
22
 
 
 
23
 
24
  ## Use
25
 
26
  ### Intended use
27
 
28
+ This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English.
 
29
 
30
  ### Generation
31
 
 
33
  # pip install -q transformers
34
  from transformers import AutoModelForCausalLM, AutoTokenizer
35
 
36
+ model = "HuggingFaceTB/FineMath-Llama-3B"
37
  device = "cuda" # for GPU usage or "cpu" for CPU usage
38
 
39
  tokenizer = AutoTokenizer.from_pretrained(model)
 
50
 
51
  You can load a specific model revision with `transformers` using the argument `revision`:
52
  ```python
53
+ model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/FineMath-Llama-3B", revision="10B")
54
  ```
55
  You can access all the revisions for the models via the following code:
56
  ```python
57
  from huggingface_hub import list_repo_refs
58
+ out = list_repo_refs("HuggingFaceTB/FineMath-Llama-3B")
59
  print([b.name for b in out.branches])
60
  ```
61
 
62
  ## Training
63
  ### Model
64
  - **Architecture**: Llama3
65
+ - **Pretraining steps**: 160k
66
+ - **Pretraining tokens**: 160B
67
  - **Precision**: bfloat16
68
 
69
  ### Hardware