CofeAI
/

FLM-2-52B-Instruct-2407

Text Generation

Model card Files Files and versions Community

sleepylx commited on Jul 22, 2024

Commit

8498f6d

•

1 Parent(s): 578e781

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -44,8 +44,8 @@ print(tokenizer.decode(response.cpu()[0], skip_special_tokens=True))
 We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
-| Models                  | Overall | Math. | Logi. | Fund. | Chi. | Open. | Writ. | Role. | Pro. |
-| ----------------------- | ------- | ----- | ----- | ----- | ---- | ----- | ----- | ----- | ---- |
 | gpt-4-1106-preview      |   **7.58**  | **7.39**  | **6.83**  | **7.69**  |<u>7.07</u>| **8.66**  | **8.23**  | **8.08**  | **8.55**  |
 | gpt-4-0613              | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16      | 6.76      | 7.26      | 7.31      | 7.48      | 7.56      |
 | gpt-3.5-turbo-0613      |   5.68      | 4.90      | 4.79      | 6.01      | 5.60      | 6.97      | 7.27      | 6.98      | 6.29      |

 We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
+| Models                  | Overall     | Math.     | Logi.     | Fund.     | Chi.      | Open.     | Writ.     | Role.     | Pro.      |
+| ----------------------- | :-------:   | :-----:   | :-----:   | :-----:   | :----:    | :-----:   | :-----:   | :-----:   | :----:    |
 | gpt-4-1106-preview      |   **7.58**  | **7.39**  | **6.83**  | **7.69**  |<u>7.07</u>| **8.66**  | **8.23**  | **8.08**  | **8.55**  |
 | gpt-4-0613              | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16      | 6.76      | 7.26      | 7.31      | 7.48      | 7.56      |
 | gpt-3.5-turbo-0613      |   5.68      | 4.90      | 4.79      | 6.01      | 5.60      | 6.97      | 7.27      | 6.98      | 6.29      |