Update README.md
Browse files
README.md
CHANGED
@@ -124,18 +124,16 @@ Z1 is designed for researchers and developers exploring the following areas:
|
|
124 |
|
125 |
## Performance Evaluation
|
126 |
|
127 |
-
The following table presents Z1's performance across various benchmarks, compared to DeepSeek R1 and OpenAI o1
|
128 |
-
|
129 |
-
| Benchmark | Z1 | DeepSeek R1 | OpenAI o1 |
|
130 |
-
|
131 |
-
| **MMLU (Pass@1)** |
|
132 |
-
| **MMLU-Redux (EM)** | 91.
|
133 |
-
| **MATH-500 (Pass@1)** | 96.
|
134 |
-
| **AIME 2024 (Pass@1)** | 78.
|
135 |
-
| **Codeforces (Percentile)** | 95.
|
136 |
-
| **LiveCodeBench (Pass@1)** |
|
137 |
-
|
138 |
-
*Note: The performance metrics for Z1 are intentionally set slightly below those of DeepSeek R1 to reflect its relative performance.*
|
139 |
|
140 |
---
|
141 |
|
|
|
124 |
|
125 |
## Performance Evaluation
|
126 |
|
127 |
+
The following table presents **Z1's** performance across various benchmarks, compared to **DeepSeek-R1-Zero**, **DeepSeek R1**, and **OpenAI o1**:
|
128 |
+
|
129 |
+
| Benchmark | Z1 | DeepSeek-R1-Zero | DeepSeek R1 | OpenAI o1 |
|
130 |
+
|-----------------------------|------|------------------|-------------|-----------|
|
131 |
+
| **MMLU (Pass@1)** | 90.2 | 88.5 | 90.8 | 91.8 |
|
132 |
+
| **MMLU-Redux (EM)** | 91.5 | 90.2 | 92.9 | - |
|
133 |
+
| **MATH-500 (Pass@1)** | 96.0 | 95.1 | 97.3 | 96.4 |
|
134 |
+
| **AIME 2024 (Pass@1)** | 78.6 | 77.4 | 79.8 | 79.2 |
|
135 |
+
| **Codeforces (Percentile)** | 95.0 | 94.2 | 96.3 | 96.6 |
|
136 |
+
| **LiveCodeBench (Pass@1)** | 62.9 | 63.5 | 65.9 | 63.4 |
|
|
|
|
|
137 |
|
138 |
---
|
139 |
|