|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# **GeM2-Llamion-14B** |
|
|
|
We have released **Llamion** as **GeM 2.0**, the second series of generative models developed by VAIV Company to address the our principal business needs. |
|
|
|
**Llamion** (Llamafied Orion) is derived from transforming the [Orion model](https://huggingface.co/OrionStarAI/Orion-14B-LongChat) |
|
into [the standard LLaMA architecture](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py) |
|
through parameter mapping and offline knowledge transfer. |
|
Further technical specifications and study results will be detailed in our upcoming paper, available on this page. |
|
|
|
![vaiv_png](./vaiv.png) |
|
|
|
Notably, the LongChat model supports an extensive text range of 200K tokens. |
|
The following figure shows the perplexity of models |
|
on [English Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.en) |
|
and [Korean Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.ko), respectively. |
|
|
|
![ppl_enwiki](./ppl_enwiki.png) |
|
|
|
![ppl_kowiki](./ppl_kowiki.png) |
|
|
|
### Contributors |
|
|
|
- VAIV Company AI Lab ([vaiv.kr](https://www.vaiv.kr/)) |