Update README.md
Browse files
README.md
CHANGED
@@ -44,8 +44,8 @@ print(tokenizer.decode(response.cpu()[0], skip_special_tokens=True))
|
|
44 |
|
45 |
We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
|
46 |
|
47 |
-
| Models | Overall
|
48 |
-
| ----------------------- |
|
49 |
| gpt-4-1106-preview | **7.58** | **7.39** | **6.83** | **7.69** |<u>7.07</u>| **8.66** | **8.23** | **8.08** | **8.55** |
|
50 |
| gpt-4-0613 | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16 | 6.76 | 7.26 | 7.31 | 7.48 | 7.56 |
|
51 |
| gpt-3.5-turbo-0613 | 5.68 | 4.90 | 4.79 | 6.01 | 5.60 | 6.97 | 7.27 | 6.98 | 6.29 |
|
|
|
44 |
|
45 |
We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
|
46 |
|
47 |
+
| Models | Overall | Math. | Logi. | Fund. | Chi. | Open. | Writ. | Role. | Pro. |
|
48 |
+
| ----------------------- | :-------: | :-----: | :-----: | :-----: | :----: | :-----: | :-----: | :-----: | :----: |
|
49 |
| gpt-4-1106-preview | **7.58** | **7.39** | **6.83** | **7.69** |<u>7.07</u>| **8.66** | **8.23** | **8.08** | **8.55** |
|
50 |
| gpt-4-0613 | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16 | 6.76 | 7.26 | 7.31 | 7.48 | 7.56 |
|
51 |
| gpt-3.5-turbo-0613 | 5.68 | 4.90 | 4.79 | 6.01 | 5.60 | 6.97 | 7.27 | 6.98 | 6.29 |
|