Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,8 @@ It is based on a merge of the following models using [LazyMergekit](https://cola
|
|
30 |
|
31 |
Special thanks to [Jon Durbin](https://huggingface.co/jondurbin), [Intel](https://huggingface.co/Intel), and [Argilla](https://huggingface.co/argilla) for the preference datasets.
|
32 |
|
|
|
|
|
33 |
## 🔍 Applications
|
34 |
|
35 |
This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
|
@@ -44,7 +46,7 @@ It is one of the very best 7B models in terms of instructing following and reaso
|
|
44 |
|
45 |
### Nous
|
46 |
|
47 |
-
|
48 |
|
49 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
50 |
|---|---:|---:|---:|---:|---:|
|
@@ -60,9 +62,9 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
|
|
60 |
|
61 |
### EQ-bench
|
62 |
|
63 |
-
AlphaMonarch-7B is
|
64 |
-
|
65 |
|
|
|
66 |
|
67 |
### MT-Bench
|
68 |
|
@@ -71,32 +73,34 @@ AlphaMonarch-7B is the second best-performing 7B model on [EQ-bench](https://eqb
|
|
71 |
score
|
72 |
model turn
|
73 |
gpt-4 1 8.95625
|
74 |
-
OmniBeagle-7B 1 8.
|
75 |
AlphaMonarch-7B 1 8.23750
|
76 |
claude-v1 1 8.15000
|
|
|
77 |
gpt-3.5-turbo 1 8.07500
|
78 |
claude-instant-v1 1 7.80000
|
79 |
|
80 |
-
|
81 |
########## Second turn ##########
|
82 |
score
|
83 |
model turn
|
84 |
gpt-4 2 9.025000
|
85 |
claude-instant-v1 2 8.012658
|
|
|
86 |
gpt-3.5-turbo 2 7.812500
|
87 |
claude-v1 2 7.650000
|
88 |
AlphaMonarch-7B 2 7.618750
|
89 |
-
|
90 |
|
91 |
########## Average ##########
|
92 |
score
|
93 |
model
|
94 |
gpt-4 8.990625
|
95 |
-
OmniBeagle-7B
|
96 |
gpt-3.5-turbo 7.943750
|
97 |
AlphaMonarch-7B 7.928125
|
98 |
claude-instant-v1 7.905660
|
99 |
claude-v1 7.900000
|
|
|
100 |
NeuralBeagle14-7B 7.628125
|
101 |
```
|
102 |
|
|
|
30 |
|
31 |
Special thanks to [Jon Durbin](https://huggingface.co/jondurbin), [Intel](https://huggingface.co/Intel), and [Argilla](https://huggingface.co/argilla) for the preference datasets.
|
32 |
|
33 |
+
**Try the demo**: https://huggingface.co/spaces/mlabonne/AlphaMonarch-7B-GGUF-Chat
|
34 |
+
|
35 |
## 🔍 Applications
|
36 |
|
37 |
This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
|
|
|
46 |
|
47 |
### Nous
|
48 |
|
49 |
+
AlphaMonarch-7B is the best-performing 7B model on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)). See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
50 |
|
51 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
52 |
|---|---:|---:|---:|---:|---:|
|
|
|
62 |
|
63 |
### EQ-bench
|
64 |
|
65 |
+
AlphaMonarch-7B is also outperforming 70B and 120B parameter models on [EQ-bench](https://eqbench.com/) by [Samuel J. Paech](https://twitter.com/sam_paech), who kindly ran the evaluations.
|
|
|
66 |
|
67 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/dnCFxieqLiAC3Ll6CfdZW.png)
|
68 |
|
69 |
### MT-Bench
|
70 |
|
|
|
73 |
score
|
74 |
model turn
|
75 |
gpt-4 1 8.95625
|
76 |
+
OmniBeagle-7B 1 8.31250
|
77 |
AlphaMonarch-7B 1 8.23750
|
78 |
claude-v1 1 8.15000
|
79 |
+
NeuralMonarch-7B 1 8.09375
|
80 |
gpt-3.5-turbo 1 8.07500
|
81 |
claude-instant-v1 1 7.80000
|
82 |
|
|
|
83 |
########## Second turn ##########
|
84 |
score
|
85 |
model turn
|
86 |
gpt-4 2 9.025000
|
87 |
claude-instant-v1 2 8.012658
|
88 |
+
OmniBeagle-7B 2 7.837500
|
89 |
gpt-3.5-turbo 2 7.812500
|
90 |
claude-v1 2 7.650000
|
91 |
AlphaMonarch-7B 2 7.618750
|
92 |
+
NeuralMonarch-7B 2 7.375000
|
93 |
|
94 |
########## Average ##########
|
95 |
score
|
96 |
model
|
97 |
gpt-4 8.990625
|
98 |
+
OmniBeagle-7B 8.075000
|
99 |
gpt-3.5-turbo 7.943750
|
100 |
AlphaMonarch-7B 7.928125
|
101 |
claude-instant-v1 7.905660
|
102 |
claude-v1 7.900000
|
103 |
+
NeuralMonarch-7B 7.734375
|
104 |
NeuralBeagle14-7B 7.628125
|
105 |
```
|
106 |
|