ksyang's picture
Update README.md
ba2f0db verified
|
raw
history blame
1.17 kB
---
license: apache-2.0
---
# **GeM2-Llamion-14B**
We have released **Llamion** as **GeM 2.0**, the second series of generative models developed by VAIV Company to address the our principal business needs.
**Llamion** (Llamafied Orion) is derived from transforming the [Orion model](https://huggingface.co/OrionStarAI/Orion-14B-LongChat)
into [the standard LLaMA architecture](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py)
through parameter mapping and offline knowledge transfer.
Further technical specifications and study results will be detailed in our upcoming paper, available on this page.
![vaiv_png](./vaiv.png)
Notably, the LongChat model supports an extensive text range of 200K tokens.
The following figure shows the perplexity of models
on [English Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.en)
and [Korean Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.ko), respectively.
![ppl_enwiki](./ppl_enwiki.png)
![ppl_kowiki](./ppl_kowiki.png)
### Contributors
- VAIV Company AI Lab ([vaiv.kr](https://www.vaiv.kr/))