Update README.md
Browse files
README.md
CHANGED
@@ -213,6 +213,48 @@ hf (pretrained=fblgit/LUNA-SOLARkrautLM-Instruct,dtype=float16), gen_kwargs: (),
|
|
213 |
| - social_sciences|N/A |none | 5|acc |0.7501|± |0.0684|
|
214 |
| - stem |N/A |none | 5|acc |0.5569|± |0.1360|
|
215 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
216 |
|
217 |
## Disclaimer
|
218 |
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
|
|
|
213 |
| - social_sciences|N/A |none | 5|acc |0.7501|± |0.0684|
|
214 |
| - stem |N/A |none | 5|acc |0.5569|± |0.1360|
|
215 |
```
|
216 |
+
### MT-Bench
|
217 |
+
```
|
218 |
+
########## Average ##########
|
219 |
+
score
|
220 |
+
model
|
221 |
+
gpt-4 8.990625
|
222 |
+
gpt-3.5-turbo 7.943750
|
223 |
+
claude-instant-v1 7.905660
|
224 |
+
claude-v1 7.900000
|
225 |
+
UNA-SOLAR-10.7B-Instruct-v1.0 7.521875
|
226 |
+
LUNA-SOLARkrautLM-Instruct 7.462500
|
227 |
+
vicuna-33b-v1.3 7.121875
|
228 |
+
wizardlm-30b 7.009375
|
229 |
+
Llama-2-70b-chat 6.856250
|
230 |
+
Llama-2-13b-chat 6.650000
|
231 |
+
guanaco-33b 6.528125
|
232 |
+
tulu-30b 6.434375
|
233 |
+
guanaco-65b 6.409375
|
234 |
+
oasst-sft-7-llama-30b 6.409375
|
235 |
+
palm-2-chat-bison-001 6.400000
|
236 |
+
mpt-30b-chat 6.393750
|
237 |
+
vicuna-13b-v1.3 6.387500
|
238 |
+
wizardlm-13b 6.353125
|
239 |
+
Llama-2-7b-chat 6.268750
|
240 |
+
vicuna-7b-v1.3 5.996875
|
241 |
+
baize-v2-13b 5.750000
|
242 |
+
nous-hermes-13b 5.553459
|
243 |
+
mpt-7b-chat 5.459119
|
244 |
+
gpt4all-13b-snoozy 5.452830
|
245 |
+
koala-13b 5.350000
|
246 |
+
mpt-30b-instruct 5.218750
|
247 |
+
falcon-40b-instruct 5.168750
|
248 |
+
h2ogpt-oasst-open-llama-13b 4.625000
|
249 |
+
alpaca-13b 4.531250
|
250 |
+
chatglm-6b 4.500000
|
251 |
+
oasst-sft-4-pythia-12b 4.318750
|
252 |
+
rwkv-4-raven-14b 3.984375
|
253 |
+
dolly-v2-12b 3.275000
|
254 |
+
fastchat-t5-3b 3.040625
|
255 |
+
stablelm-tuned-alpha-7b 2.753125
|
256 |
+
llama-13b 2.606250
|
257 |
+
```
|
258 |
|
259 |
## Disclaimer
|
260 |
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
|