Update README.md
Browse files
README.md
CHANGED
@@ -205,7 +205,7 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
205 |
|
206 |
| Model | Size | HumanEval+ pass@1 |
|
207 |
|-----------------------------|--------|-------------------|
|
208 |
-
| **OpenChat
|
209 |
| ChatGPT (December 12, 2023) | - | 64.6 |
|
210 |
| WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
|
211 |
| OpenChat 3.5 1210 | 7B | 63.4 |
|
@@ -215,7 +215,7 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
215 |
<h3>OpenChat-3.5 vs. Grok</h3>
|
216 |
</div>
|
217 |
|
218 |
-
🔥 OpenChat-3.5
|
219 |
|
220 |
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
221 |
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
|
|
|
205 |
|
206 |
| Model | Size | HumanEval+ pass@1 |
|
207 |
|-----------------------------|--------|-------------------|
|
208 |
+
| **OpenChat-3.5-0106** | **7B** | **65.9** |
|
209 |
| ChatGPT (December 12, 2023) | - | 64.6 |
|
210 |
| WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
|
211 |
| OpenChat 3.5 1210 | 7B | 63.4 |
|
|
|
215 |
<h3>OpenChat-3.5 vs. Grok</h3>
|
216 |
</div>
|
217 |
|
218 |
+
🔥 OpenChat-3.5-0106 (7B) now outperforms Grok-0 (33B) on **all 4 benchmarks** and Grok-1 (???B) on average and **3/4 benchmarks**.
|
219 |
|
220 |
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
221 |
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
|