vaiv
/

GeM2-Llamion-14B-LongChat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GeM2-Llamion-14B-LongChat / README.md

ksyang's picture

Update README.md

ba2f0db verified 5 months ago

|

1.17 kB

	---
	license: apache-2.0
	---

	# GeM2-Llamion-14B

	We have released Llamion as GeM 2.0, the second series of generative models developed by VAIV Company to address the our principal business needs.

	Llamion (Llamafied Orion) is derived from transforming the [Orion model](https://huggingface.co/OrionStarAI/Orion-14B-LongChat)
	into [the standard LLaMA architecture](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py)
	through parameter mapping and offline knowledge transfer.
	Further technical specifications and study results will be detailed in our upcoming paper, available on this page.

	![vaiv_png](./vaiv.png)

	Notably, the LongChat model supports an extensive text range of 200K tokens.
	The following figure shows the perplexity of models
	on [English Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.en)
	and [Korean Wikipedia corpus](https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101.ko), respectively.

	![ppl_enwiki](./ppl_enwiki.png)

	![ppl_kowiki](./ppl_kowiki.png)

	### Contributors

	- VAIV Company AI Lab ([vaiv.kr](https://www.vaiv.kr/))