Pretrain-Qwen-200M
Pretrain-Qwen-200M is a 200M model with QWen achitecture conventionally pre-trained from scratch on the Pile for 50B tokens.
We also open-source the tokenized pre-training corpus for reproducibility.
It is used as the baseline for MiniLLM-Qwen-200M
Evaluation
MiniPLM models achieves better performance given the same computation and scales well across model sizes:
Other Baselines
Citation
@article{miniplm,
title={MiniPLM: Knowledge Distillation for Pre-Training Language Models},
author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
journal={arXiv preprint arXiv:2410.17215},
year={2024}
}
- Downloads last month
- 291
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.