Update README.md
Browse files
README.md
CHANGED
@@ -5,8 +5,9 @@ datasets:
|
|
5 |
---
|
6 |
|
7 |
# Doge-tokenizer
|
8 |
-
Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus)
|
|
|
9 |
- FineWeb-Edu 70%
|
10 |
- Cosmopedia v2 20%
|
11 |
- Python-Edu 5%
|
12 |
-
- FineMath 5%
|
|
|
5 |
---
|
6 |
|
7 |
# Doge-tokenizer
|
8 |
+
Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus), and support reasoning fine-tuning like R1.
|
9 |
+
This tokenizer was trained on 2M samples from:
|
10 |
- FineWeb-Edu 70%
|
11 |
- Cosmopedia v2 20%
|
12 |
- Python-Edu 5%
|
13 |
+
- FineMath 5%
|