jianqing666 commited on
Commit
6bd3cb0
·
verified ·
1 Parent(s): 1a9c4c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -38,7 +38,7 @@ Models output text only.
38
  | GPT-4 | 74.08 | 65.06 | 72.50 | 85.67 | 57.76 | 84.06 | 79.43 |
39
 
40
  <!-- Benchmark evaluation on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main. -->
41
- | | STEM | Humanities | Social Sciences | Others | Average |
42
  |------------------|------|------|------|------|------|
43
  | Bloomz-7B-base | 33.35 | 29.29 | 37.58 | 34.53 | 33.69 |
44
  | LLaMA2-7B-base | 30.30 | 29.33 | 27.46 | 30.78 | 29.37 |
@@ -49,11 +49,11 @@ Models output text only.
49
  | Jais-30B-v1-base | 32.67 | 30.67 | 42.13 | 39.60 | 36.27 |
50
  | ChatGPT 3.5 Turbo | **43.38** | **44.12** | **55.57** | **53.21** | **49.07** |
51
 
52
- <!-- | AceGPT-13B-base | 36.60 | 38.74 | 43.76 | <u>42.72</u> | 40.45 | -->
53
- <!-- | AceGPT-7B-base | 29.73 | 30.95 | 33.45 | 34.42 | 32.14 | -->
54
 
55
 
56
- Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU)), and assessed based on its source settings.
57
  | | STEM | Social Sciences | Humanities | Arabic Language | Other | Average |
58
  |------------------|------|------|------|------|------|------|
59
  | Bloomz-7B-base | - | - | - | - | - | - |
@@ -65,8 +65,8 @@ Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU))
65
  | Jais-30B-v1-base | 39.5 | 45.6 | <u>50.5</u> | 34.6 | 49.1 | 44.8 |
66
  | ChatGPT 3.5 Turbo | **53.8** | **57.0** | **57.5** | **57.6** | **63.8** | **57.7** |
67
 
68
- <!-- | AceGPT-7B-base | 35.4 | 35.9 | 36.2 | 31.1 | 41.7 | 36.3 |
69
- | AceGPT-13B-base | <u>42.7</u> | 45.5 | 48.3 | 42.4 | 50.7 | 46.1 | -->
70
 
71
  ## Samples
72
  #### Sample1(abstract_algebra)
 
38
  | GPT-4 | 74.08 | 65.06 | 72.50 | 85.67 | 57.76 | 84.06 | 79.43 |
39
 
40
  <!-- Benchmark evaluation on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main. -->
41
+ <!-- | | STEM | Humanities | Social Sciences | Others | Average |
42
  |------------------|------|------|------|------|------|
43
  | Bloomz-7B-base | 33.35 | 29.29 | 37.58 | 34.53 | 33.69 |
44
  | LLaMA2-7B-base | 30.30 | 29.33 | 27.46 | 30.78 | 29.37 |
 
49
  | Jais-30B-v1-base | 32.67 | 30.67 | 42.13 | 39.60 | 36.27 |
50
  | ChatGPT 3.5 Turbo | **43.38** | **44.12** | **55.57** | **53.21** | **49.07** |
51
 
52
+ | AceGPT-13B-base | 36.60 | 38.74 | 43.76 | <u>42.72</u> | 40.45 |
53
+ | AceGPT-7B-base | 29.73 | 30.95 | 33.45 | 34.42 | 32.14 | -->
54
 
55
 
56
+ <!-- Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU)), and assessed based on its source settings.
57
  | | STEM | Social Sciences | Humanities | Arabic Language | Other | Average |
58
  |------------------|------|------|------|------|------|------|
59
  | Bloomz-7B-base | - | - | - | - | - | - |
 
65
  | Jais-30B-v1-base | 39.5 | 45.6 | <u>50.5</u> | 34.6 | 49.1 | 44.8 |
66
  | ChatGPT 3.5 Turbo | **53.8** | **57.0** | **57.5** | **57.6** | **63.8** | **57.7** |
67
 
68
+ | AceGPT-7B-base | 35.4 | 35.9 | 36.2 | 31.1 | 41.7 | 36.3 |
69
+ | AceGPT-13B-base | <u>42.7</u> | 45.5 | 48.3 | 42.4 | 50.7 | 46.1 | --> -->
70
 
71
  ## Samples
72
  #### Sample1(abstract_algebra)