chuxin-llm
/

Chuxin-1.6B-Base

+---
+license: mit
+---
+# Chuxin-1.6B-Base
+<br>
+## 介绍 (Introduction)
+**Chuxin-1.6B-Base**是16亿参数规模的模型。Chuxin-1.6B完全基于开源数据构建，在经过超大规模数据训练后，Chuxin-1.6B在各类下游任务上具有非常强的竞争力。
+**Chuxin-1.6B-1M**是基于Chuxin-1.6B-base模型在1M窗口下训练后的结果，大海捞针实验显示其具有非常强的上下文检索能力。
+如果您想了解更多关于Chuxin-1.6B开源模型的细节，我们建议您参阅我们的技术报告[Arxiv](https://xxxx)
+**Chuxin-1.6B-Base** is a model with 1.6 billion parameters developed by Taichu. Chuxin-1.6B is built entirely on open-source data. After being trained with large-scale data, Chuxin has very competitive capabilities in various downstream tasks.
+**Chuxin-1.6B-1M** is the result of training the Chuxin-1.6B-base model with a 1M windows. Experiments such as searching for a needle in a haystack demonstrate its strong contextual retrieval abilities.
+If you would like to learn more about the Chuxin-1.6B open-source model, we suggest you refer to our technical report on [Arxiv](https://xxxx).
+<br>
+## 快速使用（Quickstart）
+您可以通过以下代码轻松调用：
+You can easily call the model with the following code:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", device_map="auto", trust_remote_code=True, bf16=True).eval()
+inputs = tokenizer('蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是', return_tensors='pt')
+inputs = inputs.to(model.device)
+pred = model.generate(**inputs, max_new_tokens=20, do_sample=False)
+print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
+# 蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是亚的斯亚贝巴（Addis Ababa）...
+```
+## 评测效果（Evaluation）
+### (常识推理和阅读理解)  Common Sense Reasoning and Reading Comprehension tasks
+| Model         | size | ARC-c |ARC-e |Boolq |Copa |Hellaswag |OpenbookQA |Piqa |Sciq |Winogrande |Avg|
+|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
+| Gemma     |    2B   |     48.98    |  78.45    |  69.51   |  84    |  71.73     |  39.8    |  78.02   |  94.3    |  65.51    |  70.03      |
+| H2O-Danube†     |    1.8B  |     35.84   |  62.29   |  65.81   |  -   |  68.20     |  37.6    |  76.93   |  -    |  61.96   |  -     |
+| Qwen1.5    |    1.8B   |      37.03     |  67.51    |  66.64    |  78    |  61.60     |  34.40    |  73.99     |  93     |  61.56     |  63.74    |
+| StableLM 2     |    1.6B    |      43.52   |69.44     | 75.5    | 84     | 70.3      | 39.6     | 76.82    | 96.1      | 64.17     | 68.82     |
+| OpenLlama†   |    3B    |      34   |69| 68| -| 49| 40| 75| -| 62 |-|
+| CT-LLM |  2B |  34.81   |  65.49   | 62.45    | 74    | 54.77      | 33.4     | 71.38   | 90.6     | 57.85     | 60.63     |
+| TinyLLama |  1.1B  |  34.81   | 67.47     | 63.15      | 74     | 60    | 34.6      | 73.12     | 88.8     | 58.88     | 61.64    |
+| OLMo |  1B |  34.22   | 67.55     | 61.4     | 82     | 63.96      | 36.4     | 75.1    | 86.7     | 60.3      | 63.07     |
+| Chuxin-1.6B-Base |  1.6B |  39.68  | 71.38     | 71.25      | 83    | 66.09     | 35.00      | 77.09     | 95     | 63.54      | 66.89     |
+带有†的模型表示我们直接报告了相应论文中的分数，其他的则来自于我们重新测试的结果。
+Models with † denote that we directly report the scores from the corresponding paper, and others are from our implementation.
+### Open LLM LeaderBoard
+| Model         | size | ARC-c  |HellaSwag|MMLU |TruthfulQA |Winogrande |GSM-8k |Avg |Avg wo GSM|
+|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
+| Gemma     |    2B   |     48.98   |  71.73     |  42.47    |  33   |  65.51    |10.08|  45.3    |  52.34      |
+| H2O-Danube    |    1.8B    |      39. 68    | 69.75    | 25.97    | 33.63   | 64.17| 2.05    | 39.21     |46.64|
+| Qwen1.5†     |    1.8B   |      37.88     |   61.42    |  46.71   |  39.43    |  60.3     |  33.59     |  46.55   | 49.15|
+| StableLM 2     |    1.6B    |      43.52 |70.3  | 39.8     | 36.61     | 64.17   | 17.29      | 45.28     | 50.88    |
+| OpenLlama†      |     3B    |    39.9  | 71.6    | 27.1    | 34.8     | 67   | 0.9      |40.3|48.08|
+| CT-LLM |  2B |  34.81   | 54.77      | 37.81     | 39.81   | 57.85     | 7.35     | 38.73     | 45.01|
+| TinyLLama |  1.1B  |  33.87  | 60.31   | 26.04     | 37.32   | 59.51    | 1.44     | 36.42    |43.41|
+| OLMo |  1B |  34.22   | 63.96      | 35.44     | 35.53    | 62.67     | 9.86     | 41.81    |48.2|
+| Chuxin-1.6B-Base |  1.6B |  39.68  | 66.09     | 41.07      | 37.65    | 63.54    | 12.66     | 43.45    |49.61|
+带有†的模型表示我们直接报告 Open LLM排行榜的分数，其他的则来自于我们重新测试的结果。
+Models with † denote that we directly report the scores from the Open LLM Leaderboard, and others are from our implementation.
+### CMMLU, C-Eval and HumanEval
+| Model         | size | C-Eval  |CMMLU|HUMANEVAL |
+|:--------------|:----------:|:-----------:|:-----------:|:-----------:|
+| Gemma     |    2B   |     31   |  31.06    |  9.51|
+| Qwen1.5    |    1.8B   |      59.38     |   57.08  |  23.17   |
+| StableLM 2     |    1.6B    |      29.27 |30.1 | 7.32     |
+| CT-LLM |  2B |  36.78  | 36.4      | 9.15     |
+| Chuxin-1.6B-Base |  1.6B |  39.31  | 37.11     | 9.76 |
+## 引用 (Citation)
+如果你觉得我们的工作对你有帮助，欢迎引用！
+If you find our work helpful, feel free to give us a cite.
+```
+@article{chuxin,
+  title={CHUXIN: 1.6B TECHNICAL REPORT},
+  author={Zhuang Xiaomin, Jiang yufan, Qiaozhi He, Zhihua Wu},
+  journal={arXiv preprint arXiv:xxx},
+  year={2024}
+}
+```
+<br>