laineyyy commited on
Commit
2767d58
1 Parent(s): 5da6497

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -54,10 +54,12 @@ We relied on the popular MTBench benchmark to evaluate multi-turn performance.
54
 
55
  Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
56
 
 
 
57
  | Eval | Overall | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
58
  | :---- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ----: |
59
- | MTBench English | 6.16 | 3.65 | 6.55 | 9.6 | 2.25 | 4.25 | 7.25 | 7.42 | 8.37 |
60
- | MTBench Finnish | 5.73 | 3.05 | 6.05 | 9.6 | 1.25 | 3.65 | 7.0 | 7.65 | 7.6 |
61
 
62
 
63
  ## License
 
54
 
55
  Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
56
 
57
+ Note: Updated on 18 June 2024
58
+
59
  | Eval | Overall | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
60
  | :---- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ----: |
61
+ | MTBench English | 6.13 | 4.25 | 6.65 | 9.60 | 2.30 | 4.30 | 7.05 | 7.55 | 7.35 |
62
+ | MTBench Finnish | 6.06 | 3.70 | 6.37 | 9.25 | 1.20 | 4.35 | 7.35 | 7.80 | 8.50 |
63
 
64
 
65
  ## License