Chuxin-1.6B-Base / README.md

Update README.md

444ee4e verified 8 months ago

6.28 kB

	---
	license: mit
	---
	# Chuxin-1.6B-Base
	<br>

	## 介绍 (Introduction)

	Chuxin-1.6B-Base是16亿参数规模的模型。Chuxin-1.6B完全基于开源数据构建，在经过超大规模数据训练后，Chuxin-1.6B在各类下游任务上具有非常强的竞争力。

	Chuxin-1.6B-1M是基于Chuxin-1.6B-base模型在1M窗口下训练后的结果，大海捞针实验显示其具有非常强的上下文检索能力。


	如果您想了解更多关于Chuxin-1.6B开源模型的细节，我们建议您参阅我们的[技术报告](https://arxiv.org/pdf/2405.04828)

	Chuxin-1.6B-Base is a model with 1.6 billion parameters. Chuxin-1.6B is built entirely on open-source data. After being trained with large-scale data, Chuxin has very competitive capabilities in various downstream tasks.

	Chuxin-1.6B-1M is the result of training the Chuxin-1.6B-base model with a 1M windows. Experiments such as searching for a needle in a haystack demonstrate its strong contextual retrieval abilities.

	If you would like to learn more about the Chuxin-1.6B open-source model, we suggest you refer to our [technical report](https://arxiv.org/pdf/2405.04828).
	<br>

	## 快速使用（Quickstart）

	您可以通过以下代码轻松调用：

	You can easily call the model with the following code:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", device_map="auto", trust_remote_code=True, bf16=True).eval()
	inputs = tokenizer('蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是', return_tensors='pt')
	inputs = inputs.to(model.device)
	pred = model.generate(**inputs, max_new_tokens=15, do_sample=False)
	print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
	# 蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是亚的斯亚贝巴（Addis Ababa）...
	```

	## 评测效果（Evaluation）

	### (常识推理和阅读理解) Common Sense Reasoning and Reading Comprehension tasks

	\| Model \| size \| ARC-c \|ARC-e \|Boolq \|Copa \|Hellaswag \|OpenbookQA \|Piqa \|Sciq \|Winogrande \|Avg\|
	\|:--------------\|:----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|
	\| Gemma \| 2B \| 48.98 \| 78.45 \| 69.51 \| 84 \| 71.73 \| 39.8 \| 78.02 \| 94.3 \| 65.51 \| 70.03 \|
	\| H2O-Danube† \| 1.8B \| 35.84 \| 62.29 \| 65.81 \| - \| 68.20 \| 37.6 \| 76.93 \| - \| 61.96 \| - \|
	\| Qwen1.5 \| 1.8B \| 37.03 \| 67.51 \| 66.64 \| 78 \| 61.60 \| 34.40 \| 73.99 \| 93 \| 61.56 \| 63.74 \|
	\| StableLM 2 \| 1.6B \| 43.52 \|69.44 \| 75.5 \| 84 \| 70.3 \| 39.6 \| 76.82 \| 96.1 \| 64.17 \| 68.82 \|
	\| OpenLlama† \| 3B \| 34 \|69\| 68\| -\| 49\| 40\| 75\| -\| 62 \|-\|
	\| CT-LLM \| 2B \| 34.81 \| 65.49 \| 62.45 \| 74 \| 54.77 \| 33.4 \| 71.38 \| 90.6 \| 57.85 \| 60.63 \|
	\| TinyLLama \| 1.1B \| 34.81 \| 67.47 \| 63.15 \| 74 \| 60 \| 34.6 \| 73.12 \| 88.8 \| 58.88 \| 61.64 \|
	\| OLMo \| 1B \| 34.22 \| 67.55 \| 61.4 \| 82 \| 63.96 \| 36.4 \| 75.1 \| 86.7 \| 60.3 \| 63.07 \|
	\| Chuxin-1.6B-Base \| 1.6B \| 39.68 \| 71.38 \| 71.25 \| 83 \| 66.09 \| 35.00 \| 77.09 \| 95 \| 63.54 \| 66.89 \|

	带有†的模型表示我们直接报告了相应论文中的分数，其他的则来自于我们重新测试的结果。

	Models with † denote that we directly report the scores from the corresponding paper, and others are from our implementation.

	### Open LLM LeaderBoard

	\| Model \| size \| ARC-c \|HellaSwag\|MMLU \|TruthfulQA \|Winogrande \|GSM-8k \|Avg \|Avg wo GSM\|
	\|:--------------\|:----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|:-----------:\|
	\| Gemma \| 2B \| 48.98 \| 71.73 \| 42.47 \| 33 \| 65.51 \|10.08\| 45.3 \| 52.34 \|
	\| H2O-Danube \| 1.8B \| 39. 68 \| 69.75 \| 25.97 \| 33.63 \| 64.17\| 2.05 \| 39.21 \|46.64\|
	\| Qwen1.5† \| 1.8B \| 37.88 \| 61.42 \| 46.71 \| 39.43 \| 60.3 \| 33.59 \| 46.55 \| 49.15\|
	\| StableLM 2 \| 1.6B \| 43.52 \|70.3 \| 39.8 \| 36.61 \| 64.17 \| 17.29 \| 45.28 \| 50.88 \|
	\| OpenLlama† \| 3B \| 39.9 \| 71.6 \| 27.1 \| 34.8 \| 67 \| 0.9 \|40.3\|48.08\|
	\| CT-LLM \| 2B \| 34.81 \| 54.77 \| 37.81 \| 39.81 \| 57.85 \| 7.35 \| 38.73 \| 45.01\|
	\| TinyLLama \| 1.1B \| 33.87 \| 60.31 \| 26.04 \| 37.32 \| 59.51 \| 1.44 \| 36.42 \|43.41\|
	\| OLMo \| 1B \| 34.22 \| 63.96 \| 35.44 \| 35.53 \| 62.67 \| 9.86 \| 41.81 \|48.2\|
	\| Chuxin-1.6B-Base \| 1.6B \| 39.68 \| 66.09 \| 41.07 \| 37.65 \| 63.54 \| 12.66 \| 43.45 \|49.61\|

	带有†的模型表示我们直接报告 Open LLM排行榜的分数，其他的则来自于我们重新测试的结果。

	Models with † denote that we directly report the scores from the Open LLM Leaderboard, and others are from our implementation.

	### CMMLU, C-Eval and HumanEval

	\| Model \| size \| C-Eval \|CMMLU\|HUMANEVAL \|
	\|:--------------\|:----------:\|:-----------:\|:-----------:\|:-----------:\|
	\| Gemma \| 2B \| 31 \| 31.06 \| 9.51\|
	\| Qwen1.5 \| 1.8B \| 59.38 \| 57.08 \| 23.17 \|
	\| StableLM 2 \| 1.6B \| 29.27 \|30.1 \| 7.32 \|
	\| CT-LLM \| 2B \| 36.78 \| 36.4 \| 9.15 \|
	\| Chuxin-1.6B-Base \| 1.6B \| 39.31 \| 37.11 \| 9.76 \|

	## 引用 (Citation)

	如果你觉得我们的工作对你有帮助，欢迎引用！

	If you find our work helpful, feel free to give us a cite.

	```
	@article{chuxin,
	title={CHUXIN: 1.6B TECHNICAL REPORT},
	author={Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu},
	journal={arXiv preprint arXiv:2405.04828},
	year={2024}
	}
	```
	<br>