Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ print(sequences[0]['generated_text'])
|
|
65 |
|
66 |
## Evaluation
|
67 |
|
68 |
-
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).
|
69 |
|
70 |
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|
71 |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|
@@ -100,7 +100,15 @@ We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2
|
|
100 |
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|
101 |
|
102 |
|
103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
|
106 |
|
|
|
65 |
|
66 |
## Evaluation
|
67 |
|
68 |
+
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot). We release the Moxin-7B-finetuned as our base model. We further finetune our base model on Tulu v2 to obtain our chat model.
|
69 |
|
70 |
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|
71 |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|
|
|
100 |
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|
101 |
|
102 |
|
103 |
+
## Citation
|
104 |
+
```
|
105 |
+
@article{zhao2024fully,
|
106 |
+
title={Fully Open Source Moxin-7B Technical Report},
|
107 |
+
author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others},
|
108 |
+
journal={arXiv preprint arXiv:2412.06845},
|
109 |
+
year={2024}
|
110 |
+
}
|
111 |
+
```
|
112 |
|
113 |
|
114 |
|