Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,89 @@ This model is a Mixure of Experts (MoE) made with [mergekit](https://github.com/
|
|
14 |
* [maywell/PiVoT-0.1-Starling-LM-RP](https://huggingface.co/maywell/PiVoT-0.1-Starling-LM-RP)
|
15 |
* [WizardLM/WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## 🧩 Configuration
|
18 |
|
19 |
```yaml
|
|
|
14 |
* [maywell/PiVoT-0.1-Starling-LM-RP](https://huggingface.co/maywell/PiVoT-0.1-Starling-LM-RP)
|
15 |
* [WizardLM/WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
16 |
|
17 |
+
## 🏆 Evaluation
|
18 |
+
|
19 |
+
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
20 |
+
|--------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
21 |
+
|[Beyonder-4x7B-v2](https://huggingface.co/shadowml/Beyonder-4x7B-v2)| 45.29| 75.95| 60.86| 46.4| 57.13|
|
22 |
+
|[NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)| 43.67| 73.24| 55.37| 41.76| 53.51|
|
23 |
+
|[OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)| 42.75| 72.99| 52.99| 40.94| 52.42|
|
24 |
+
|
25 |
+
### AGIEval
|
26 |
+
| Task |Version| Metric |Value| |Stderr|
|
27 |
+
|------------------------------|------:|--------|----:|---|-----:|
|
28 |
+
|agieval_aqua_rat | 0|acc |23.62|± | 2.67|
|
29 |
+
| | |acc_norm|23.62|± | 2.67|
|
30 |
+
|agieval_logiqa_en | 0|acc |41.47|± | 1.93|
|
31 |
+
| | |acc_norm|43.01|± | 1.94|
|
32 |
+
|agieval_lsat_ar | 0|acc |23.04|± | 2.78|
|
33 |
+
| | |acc_norm|23.48|± | 2.80|
|
34 |
+
|agieval_lsat_lr | 0|acc |51.57|± | 2.22|
|
35 |
+
| | |acc_norm|52.94|± | 2.21|
|
36 |
+
|agieval_lsat_rc | 0|acc |64.31|± | 2.93|
|
37 |
+
| | |acc_norm|64.68|± | 2.92|
|
38 |
+
|agieval_sat_en | 0|acc |79.13|± | 2.84|
|
39 |
+
| | |acc_norm|79.13|± | 2.84|
|
40 |
+
|agieval_sat_en_without_passage| 0|acc |43.20|± | 3.46|
|
41 |
+
| | |acc_norm|43.20|± | 3.46|
|
42 |
+
|agieval_sat_math | 0|acc |34.55|± | 3.21|
|
43 |
+
| | |acc_norm|32.27|± | 3.16|
|
44 |
+
|
45 |
+
Average: 45.29%
|
46 |
+
|
47 |
+
### GPT4All
|
48 |
+
| Task |Version| Metric |Value| |Stderr|
|
49 |
+
|-------------|------:|--------|----:|---|-----:|
|
50 |
+
|arc_challenge| 0|acc |61.86|± | 1.42|
|
51 |
+
| | |acc_norm|64.51|± | 1.40|
|
52 |
+
|arc_easy | 0|acc |85.06|± | 0.73|
|
53 |
+
| | |acc_norm|82.45|± | 0.78|
|
54 |
+
|boolq | 1|acc |88.35|± | 0.56|
|
55 |
+
|hellaswag | 0|acc |68.04|± | 0.47|
|
56 |
+
| | |acc_norm|85.12|± | 0.36|
|
57 |
+
|openbookqa | 0|acc |37.80|± | 2.17|
|
58 |
+
| | |acc_norm|48.60|± | 2.24|
|
59 |
+
|piqa | 0|acc |83.08|± | 0.87|
|
60 |
+
| | |acc_norm|83.95|± | 0.86|
|
61 |
+
|winogrande | 0|acc |78.69|± | 1.15|
|
62 |
+
|
63 |
+
Average: 75.95%
|
64 |
+
|
65 |
+
### TruthfulQA
|
66 |
+
| Task |Version|Metric|Value| |Stderr|
|
67 |
+
|-------------|------:|------|----:|---|-----:|
|
68 |
+
|truthfulqa_mc| 1|mc1 |44.55|± | 1.74|
|
69 |
+
| | |mc2 |60.86|± | 1.57|
|
70 |
+
|
71 |
+
Average: 60.86%
|
72 |
+
|
73 |
+
### Bigbench
|
74 |
+
| Task |Version| Metric |Value| |Stderr|
|
75 |
+
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
76 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|58.95|± | 3.58|
|
77 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|66.40|± | 2.46|
|
78 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|48.84|± | 3.12|
|
79 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.56|± | 2.21|
|
80 |
+
| | |exact_str_match |13.37|± | 1.80|
|
81 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.40|± | 2.06|
|
82 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|20.57|± | 1.53|
|
83 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|52.00|± | 2.89|
|
84 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|44.40|± | 2.22|
|
85 |
+
|bigbench_navigate | 0|multiple_choice_grade|52.10|± | 1.58|
|
86 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|69.75|± | 1.03|
|
87 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|55.36|± | 2.35|
|
88 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|23.65|± | 1.35|
|
89 |
+
|bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12|
|
90 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|73.02|± | 1.41|
|
91 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|46.80|± | 1.58|
|
92 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.08|± | 1.17|
|
93 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|19.03|± | 0.94|
|
94 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|52.00|± | 2.89|
|
95 |
+
|
96 |
+
Average: 46.4%
|
97 |
+
|
98 |
+
Average score: 57.13%
|
99 |
+
|
100 |
## 🧩 Configuration
|
101 |
|
102 |
```yaml
|