image/png

Model Card for Model ID

This model is a fine-tuned version of Qwen/Qwen-0.5B on the trl-lib/Capybara. It has been trained using TRL.

Base Model info;

  • Type: Causal Language Models
  • Training Stage: Pretraining
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
  • Number of Parameters: 0.49B
  • Number of Paramaters (Non-Embedding): 0.36B
  • Number of Layers: 24
  • Number of Attention Heads (GQA): 14 for Q and 2 for KV
  • Context Length: Full 32,768 tokens

They do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.

For more details, please refer to the blog, GitHub, and Documentation.

Requirements

The code of Qwen2.5 has been in the latest Hugging face transformers and we advise you to use the latest version of transformers.

With transformers<4.37.0, you will encounter the following error:

KeyError: 'qwen2'
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .