HuggingFaceTB
/

SmolLM2-1.7B-intermediate-checkpoints

Inference Endpoints

Model card Files Files and versions Community

eliebak HF staff commited on 7 days ago

Commit

fac3ddd

·

verified ·

1 Parent(s): 0df5d71

Create README.md

Files changed (1) hide show

README.md +54 -0

README.md ADDED Viewed

	@@ -0,0 +1,54 @@

+---
+library_name: transformers
+license: apache-2.0
+language:
+- en
+---
+# SmolLM2-1.7B Intermediate Checkpoints
+We are releasing an intermediate checkpoint of SmolLM2 to enable further research on mechanistic interpretability and learning dynamics. This repo contains the checkpoint every 250000 steps which correspond to ~500B tokens.
+## How to Load a Checkpoint
+```python
+# pip install transformers
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints"
+revision = "step-250000" # replace by the revision you want
+device = torch.device("cuda" if torch.cuda.is_available() else "mps" if hasattr(torch, 'mps') and torch.mps.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
+model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)
+inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+## Training Details
+For comprehensive information about SmolLM2 training methodology, please refer to:
+- Our [model page](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B)
+- Our [GitHub repository](https://github.com/huggingface/smollm)
+- Our [paper](https://huggingface.co/papers/2502.02737)
+## Checkpoint Details
+The GBS (Global Batch Size) in token of SmolLM2-1.7B is 2 097 152. To get the number of tokens knowing the step you need to do:
+```
+nb_tokens = nb_step * GBS
+```
+## License
+[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Citation
+```bash
+@misc{allal2025smollm2smolgoesbig,
+      title={SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model},
+      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Guilherme Penedo and Lewis Tunstall and Andrés Marafioti and Hynek Kydlíček and Agustín Piqueres Lajarín and Vaibhav Srivastav and Joshua Lochner and Caleb Fahlgren and Xuan-Son Nguyen and Clémentine Fourrier and Ben Burtenshaw and Hugo Larcher and Haojun Zhao and Cyril Zakka and Mathieu Morlon and Colin Raffel and Leandro von Werra and Thomas Wolf},
+      year={2025},
+      eprint={2502.02737},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2502.02737},
+}
+```