Transformers
English
Inference Endpoints
eliebak HF staff commited on
Commit
fac3ddd
·
verified ·
1 Parent(s): 0df5d71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ ---
7
+
8
+ # SmolLM2-1.7B Intermediate Checkpoints
9
+
10
+ We are releasing an intermediate checkpoint of SmolLM2 to enable further research on mechanistic interpretability and learning dynamics. This repo contains the checkpoint every 250000 steps which correspond to ~500B tokens.
11
+
12
+ ## How to Load a Checkpoint
13
+ ```python
14
+ # pip install transformers
15
+ import torch
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer
17
+ checkpoint = "HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints"
18
+ revision = "step-250000" # replace by the revision you want
19
+ device = torch.device("cuda" if torch.cuda.is_available() else "mps" if hasattr(torch, 'mps') and torch.mps.is_available() else "cpu")
20
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
21
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)
22
+ inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
23
+ outputs = model.generate(inputs)
24
+ print(tokenizer.decode(outputs[0]))
25
+ ```
26
+
27
+ ## Training Details
28
+ For comprehensive information about SmolLM2 training methodology, please refer to:
29
+ - Our [model page](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B)
30
+ - Our [GitHub repository](https://github.com/huggingface/smollm)
31
+ - Our [paper](https://huggingface.co/papers/2502.02737)
32
+
33
+ ## Checkpoint Details
34
+ The GBS (Global Batch Size) in token of SmolLM2-1.7B is 2 097 152. To get the number of tokens knowing the step you need to do:
35
+ ```
36
+ nb_tokens = nb_step * GBS
37
+ ```
38
+
39
+ ## License
40
+
41
+ [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
42
+
43
+ ## Citation
44
+ ```bash
45
+ @misc{allal2025smollm2smolgoesbig,
46
+ title={SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model},
47
+ author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Guilherme Penedo and Lewis Tunstall and Andrés Marafioti and Hynek Kydlíček and Agustín Piqueres Lajarín and Vaibhav Srivastav and Joshua Lochner and Caleb Fahlgren and Xuan-Son Nguyen and Clémentine Fourrier and Ben Burtenshaw and Hugo Larcher and Haojun Zhao and Cyril Zakka and Mathieu Morlon and Colin Raffel and Leandro von Werra and Thomas Wolf},
48
+ year={2025},
49
+ eprint={2502.02737},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.CL},
52
+ url={https://arxiv.org/abs/2502.02737},
53
+ }
54
+ ```