Model Card for Model ID
This model is a fine-tuned version of Qwen/Qwen-0.5B on the trl-lib/Capybara. It has been trained using TRL.
Base Model info;
- Type: Causal Language Models
- Training Stage: Pretraining
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
- Number of Parameters: 0.49B
- Number of Paramaters (Non-Embedding): 0.36B
- Number of Layers: 24
- Number of Attention Heads (GQA): 14 for Q and 2 for KV
- Context Length: Full 32,768 tokens
They do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
For more details, please refer to the blog, GitHub, and Documentation.
Requirements
The code of Qwen2.5 has been in the latest Hugging face transformers
and we advise you to use the latest version of transformers
.
With transformers<4.37.0
, you will encounter the following error:
KeyError: 'qwen2'