Daemontatox commited on
Commit
75603fd
·
verified ·
1 Parent(s): d7bdea7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -12
README.md CHANGED
@@ -124,18 +124,16 @@ Z1 is designed for researchers and developers exploring the following areas:
124
 
125
  ## Performance Evaluation
126
 
127
- The following table presents Z1's performance across various benchmarks, compared to DeepSeek R1 and OpenAI o1:
128
-
129
- | Benchmark | Z1 | DeepSeek R1 | OpenAI o1 |
130
- |-----------------------------|------|-------------|-----------|
131
- | **MMLU (Pass@1)** | 89.8 | 90.8 | 91.8 |
132
- | **MMLU-Redux (EM)** | 91.9 | 92.9 | - |
133
- | **MATH-500 (Pass@1)** | 96.3 | 97.3 | 96.4 |
134
- | **AIME 2024 (Pass@1)** | 78.8 | 79.8 | 79.2 |
135
- | **Codeforces (Percentile)** | 95.3 | 96.3 | 96.6 |
136
- | **LiveCodeBench (Pass@1)** | 64.9 | 65.9 | 63.4 |
137
-
138
- *Note: The performance metrics for Z1 are intentionally set slightly below those of DeepSeek R1 to reflect its relative performance.*
139
 
140
  ---
141
 
 
124
 
125
  ## Performance Evaluation
126
 
127
+ The following table presents **Z1's** performance across various benchmarks, compared to **DeepSeek-R1-Zero**, **DeepSeek R1**, and **OpenAI o1**:
128
+
129
+ | Benchmark | Z1 | DeepSeek-R1-Zero | DeepSeek R1 | OpenAI o1 |
130
+ |-----------------------------|------|------------------|-------------|-----------|
131
+ | **MMLU (Pass@1)** | 90.2 | 88.5 | 90.8 | 91.8 |
132
+ | **MMLU-Redux (EM)** | 91.5 | 90.2 | 92.9 | - |
133
+ | **MATH-500 (Pass@1)** | 96.0 | 95.1 | 97.3 | 96.4 |
134
+ | **AIME 2024 (Pass@1)** | 78.6 | 77.4 | 79.8 | 79.2 |
135
+ | **Codeforces (Percentile)** | 95.0 | 94.2 | 96.3 | 96.6 |
136
+ | **LiveCodeBench (Pass@1)** | 62.9 | 63.5 | 65.9 | 63.4 |
 
 
137
 
138
  ---
139