sleepylx commited on
Commit
8498f6d
1 Parent(s): 578e781

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -44,8 +44,8 @@ print(tokenizer.decode(response.cpu()[0], skip_special_tokens=True))
44
 
45
  We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
46
 
47
- | Models | Overall | Math. | Logi. | Fund. | Chi. | Open. | Writ. | Role. | Pro. |
48
- | ----------------------- | ------- | ----- | ----- | ----- | ---- | ----- | ----- | ----- | ---- |
49
  | gpt-4-1106-preview | **7.58** | **7.39** | **6.83** | **7.69** |<u>7.07</u>| **8.66** | **8.23** | **8.08** | **8.55** |
50
  | gpt-4-0613 | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16 | 6.76 | 7.26 | 7.31 | 7.48 | 7.56 |
51
  | gpt-3.5-turbo-0613 | 5.68 | 4.90 | 4.79 | 6.01 | 5.60 | 6.97 | 7.27 | 6.98 | 6.29 |
 
44
 
45
  We evaluate the alignment performance of FLM-2-52B-Instruct-2407 in Chinese across various domains utilizing [AlignBench](https://arxiv.org/pdf/2311.18743). AlignBench is a comprehensive and multidimensional evaluation benchmark designed to assess Chinese large language models’ alignment performance. It encompasses 8 categories with a total of 683 question-answer pairs, covering areas such as fundamental language ability (Fund.), Chinese advanced understanding (Chi.), open-ended questions (Open.), writing ability (Writ.), logical reasoning (Logi.), mathematics (Math.), task-oriented role playing (Role.), and professional knowledge (Pro.).
46
 
47
+ | Models | Overall | Math. | Logi. | Fund. | Chi. | Open. | Writ. | Role. | Pro. |
48
+ | ----------------------- | :-------: | :-----: | :-----: | :-----: | :----: | :-----: | :-----: | :-----: | :----: |
49
  | gpt-4-1106-preview | **7.58** | **7.39** | **6.83** | **7.69** |<u>7.07</u>| **8.66** | **8.23** | **8.08** | **8.55** |
50
  | gpt-4-0613 | <u>6.83</u> |<u>6.33</u>|<u>5.15</u>| 7.16 | 6.76 | 7.26 | 7.31 | 7.48 | 7.56 |
51
  | gpt-3.5-turbo-0613 | 5.68 | 4.90 | 4.79 | 6.01 | 5.60 | 6.97 | 7.27 | 6.98 | 6.29 |