JingzeShi
/

Doge-tokenizer

Inference Endpoints

Model card Files Files and versions Community

Doge-tokenizer

Tokenizer for the training model on smollm-corpus. This tokenizer was trained on 2M samples from:

FineWeb-Edu 70%
Cosmopedia v2 20%
Python-Edu 5%
FineMath 5%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train JingzeShi/Doge-tokenizer