jianqing666
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ Models output text only.
|
|
38 |
| GPT-4 | 74.08 | 65.06 | 72.50 | 85.67 | 57.76 | 84.06 | 79.43 |
|
39 |
|
40 |
<!-- Benchmark evaluation on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main. -->
|
41 |
-
| | STEM | Humanities | Social Sciences | Others | Average |
|
42 |
|------------------|------|------|------|------|------|
|
43 |
| Bloomz-7B-base | 33.35 | 29.29 | 37.58 | 34.53 | 33.69 |
|
44 |
| LLaMA2-7B-base | 30.30 | 29.33 | 27.46 | 30.78 | 29.37 |
|
@@ -49,11 +49,11 @@ Models output text only.
|
|
49 |
| Jais-30B-v1-base | 32.67 | 30.67 | 42.13 | 39.60 | 36.27 |
|
50 |
| ChatGPT 3.5 Turbo | **43.38** | **44.12** | **55.57** | **53.21** | **49.07** |
|
51 |
|
52 |
-
|
53 |
-
|
54 |
|
55 |
|
56 |
-
Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU)), and assessed based on its source settings.
|
57 |
| | STEM | Social Sciences | Humanities | Arabic Language | Other | Average |
|
58 |
|------------------|------|------|------|------|------|------|
|
59 |
| Bloomz-7B-base | - | - | - | - | - | - |
|
@@ -65,8 +65,8 @@ Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU))
|
|
65 |
| Jais-30B-v1-base | 39.5 | 45.6 | <u>50.5</u> | 34.6 | 49.1 | 44.8 |
|
66 |
| ChatGPT 3.5 Turbo | **53.8** | **57.0** | **57.5** | **57.6** | **63.8** | **57.7** |
|
67 |
|
68 |
-
|
69 |
-
| AceGPT-13B-base | <u>42.7</u> | 45.5 | 48.3 | 42.4 | 50.7 | 46.1 | -->
|
70 |
|
71 |
## Samples
|
72 |
#### Sample1(abstract_algebra)
|
|
|
38 |
| GPT-4 | 74.08 | 65.06 | 72.50 | 85.67 | 57.76 | 84.06 | 79.43 |
|
39 |
|
40 |
<!-- Benchmark evaluation on [Arabic MMLU](https://github.com/FreedomIntelligence/AceGPT) are conducted using accuracy scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main. -->
|
41 |
+
<!-- | | STEM | Humanities | Social Sciences | Others | Average |
|
42 |
|------------------|------|------|------|------|------|
|
43 |
| Bloomz-7B-base | 33.35 | 29.29 | 37.58 | 34.53 | 33.69 |
|
44 |
| LLaMA2-7B-base | 30.30 | 29.33 | 27.46 | 30.78 | 29.37 |
|
|
|
49 |
| Jais-30B-v1-base | 32.67 | 30.67 | 42.13 | 39.60 | 36.27 |
|
50 |
| ChatGPT 3.5 Turbo | **43.38** | **44.12** | **55.57** | **53.21** | **49.07** |
|
51 |
|
52 |
+
| AceGPT-13B-base | 36.60 | 38.74 | 43.76 | <u>42.72</u> | 40.45 |
|
53 |
+
| AceGPT-7B-base | 29.73 | 30.95 | 33.45 | 34.42 | 32.14 | -->
|
54 |
|
55 |
|
56 |
+
<!-- Benchmark evaluation on [ArabicMMLU]((https://github.com/mbzuai-nlp/ArabicMMLU)), and assessed based on its source settings.
|
57 |
| | STEM | Social Sciences | Humanities | Arabic Language | Other | Average |
|
58 |
|------------------|------|------|------|------|------|------|
|
59 |
| Bloomz-7B-base | - | - | - | - | - | - |
|
|
|
65 |
| Jais-30B-v1-base | 39.5 | 45.6 | <u>50.5</u> | 34.6 | 49.1 | 44.8 |
|
66 |
| ChatGPT 3.5 Turbo | **53.8** | **57.0** | **57.5** | **57.6** | **63.8** | **57.7** |
|
67 |
|
68 |
+
| AceGPT-7B-base | 35.4 | 35.9 | 36.2 | 31.1 | 41.7 | 36.3 |
|
69 |
+
| AceGPT-13B-base | <u>42.7</u> | 45.5 | 48.3 | 42.4 | 50.7 | 46.1 | --> -->
|
70 |
|
71 |
## Samples
|
72 |
#### Sample1(abstract_algebra)
|