Update README.md
Browse files
README.md
CHANGED
@@ -54,10 +54,12 @@ We relied on the popular MTBench benchmark to evaluate multi-turn performance.
|
|
54 |
|
55 |
Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
|
56 |
|
|
|
|
|
57 |
| Eval | Overall | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
|
58 |
| :---- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ----: |
|
59 |
-
| MTBench English | 6.
|
60 |
-
| MTBench Finnish |
|
61 |
|
62 |
|
63 |
## License
|
|
|
54 |
|
55 |
Since MTBench is an English only benchmark, we also release this fork of [MTBench Finnish](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) with multilingual support and machine translated Finnish prompts. Our scores for both benchmarks follow.
|
56 |
|
57 |
+
Note: Updated on 18 June 2024
|
58 |
+
|
59 |
| Eval | Overall | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
|
60 |
| :---- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ----: |
|
61 |
+
| MTBench English | 6.13 | 4.25 | 6.65 | 9.60 | 2.30 | 4.30 | 7.05 | 7.55 | 7.35 |
|
62 |
+
| MTBench Finnish | 6.06 | 3.70 | 6.37 | 9.25 | 1.20 | 4.35 | 7.35 | 7.80 | 8.50 |
|
63 |
|
64 |
|
65 |
## License
|