--- license: cc-by-4.0 datasets: - wikimedia/wikipedia metrics: - perplexity --- # Kiwi-1.0-0.7B-32k ## Pretrain Model * **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) * **Backbone Model**: [Qwen2.5](https://github.com/QwenLM/Qwen2.5) * **Parameters**: 700m * **Context Window**: 32k * **Language(s)**: English * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers) * **License**: Creative Common Attribute 4.0 (CCA-4.0) * **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact) ## Main Message We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models. In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance. **Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development. ## Evaluation Results ### Preplexity as Evaluation Metric Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions). Perplexity directly measures a model's ability to predict the next token, providing a clear gauge of its inherent language modeling performance without the influence of instruction tuning. #### Main Results | Dataset | Qwen2.5-0.5B | Kiwi-0.7B | |:-----------------:|:-----------:|:---------:| | **Hellaswag** | 44.82 | 83.74 | | **Arc Challenge** | 41.92 | 59.5 | | **Open Book QA** | 152.56 | 323.18 | ## Hardware and Software * **Hardware**: We utilized an A100 for training our model * **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) ## Contact Us [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact)