---
license: apache-2.0
datasets:
- HuggingFaceTB/finemath
language:
- en
base_model:
- meta-llama/Llama-3.2-3B
---

# Model Card

## Model summary

This is a continual-pre-training of [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a mix of  📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) (our new high quality math dataset) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).

The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png)

It was trained on **160B tokens** using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use [nanotron](https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining) for the training, and you can find the training scripts in our [SmolLM2 GitHub repo](https://github.com/huggingface/smollm).

## Use

### Intended use

This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English.

### Generation

```python
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = "HuggingFaceTB/FineMath-Llama-3B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model).to(device)

inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```

## Intermediate checkpoints 

We are releasing intermediate checkpoints for this model at intervals of every 10000 training steps (10B tokens) in separate branches. The naming convention is `10B`.

You can load a specific model revision with `transformers` using the argument `revision`:
```python
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/FineMath-Llama-3B", revision="10B")
```
You can access all the revisions for the models via the following code:
```python
from huggingface_hub import list_repo_refs
out = list_repo_refs("HuggingFaceTB/FineMath-Llama-3B")
print([b.name for b in out.branches])
```

## Training
### Model
- **Architecture**: Llama3  
- **Pretraining steps**: 160k
- **Pretraining tokens**: 160B
- **Precision**: bfloat16

### Hardware
- **GPUs**: 64 H100

### Software
- [nanotron](https://github.com/huggingface/nanotron/) for training
- [datatrove](https://github.com/huggingface/datatrove) for tokenization
- [lighteval](https://github.com/huggingface/lighteval) for evaluation
  
## Evaluation
We used the SmolLM2 setup to evaluate all our ablation models with `lighteval`. You can find the details here: https://github.com/huggingface/smollm/tree/main/evaluation#smollm2-base-models

## Limitations
This model was predominantly trained on English math data, potentially limiting its performance in other languages. Furthermore, the model's behavior is influenced by the quality and diversity of its training data, which may include biases and harmful content.