---
license: cc-by-4.0
datasets:
- wikimedia/wikipedia
metrics:
- perplexity
---

# Kiwi-1.0-0.7B-32k

## Pretrain Model

* **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net)
* **Backbone Model**: [Qwen2.5](https://github.com/QwenLM/Qwen2.5)
* **Parameters**: 700m
* **Context Window**: 32k
* **Language(s)**: English
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
* **License**: Creative Common Attribute 4.0 (CCA-4.0)
* **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact)

## Main Message

We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.

In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance.

**Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.

## Evaluation Results
### Preplexity as Evaluation Metric

Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions). Perplexity directly measures a model's ability to predict the next token, providing a clear gauge of its inherent language modeling performance without the influence of instruction tuning.

#### Main Results

|      Dataset      | Qwen2.5-0.5B | Kiwi-0.7B |
|:-----------------:|:-----------:|:---------:|
| **Hellaswag**     |    44.82    |   83.74   |
| **Arc Challenge** |    41.92    |    59.5   |
| **Open Book QA**  |   152.56    |   323.18  |


## Hardware and Software

* **Hardware**: We utilized an A100  for training our model
* **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)


## Contact Us
[EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact)