Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,7 @@ Here are some benchmark results, computed using the the LM Evaluation Harness wi
|
|
100 |
| Model | GSM8K (strict, 5-shot) | ARC-c (acc_norm, 25-shot) |
|
101 |
|:--------------:|-----------------------:|--------------------------:|
|
102 |
| SFT | 24.34% | 42.92% |
|
103 |
-
| Masked Thought | 24.18% |
|
104 |
| **ReMask** | **27.90%** | 43.26% |
|
105 |
|
106 |
As I expected, it improves GSM8K doesn't do much to ARC.
|
|
|
100 |
| Model | GSM8K (strict, 5-shot) | ARC-c (acc_norm, 25-shot) |
|
101 |
|:--------------:|-----------------------:|--------------------------:|
|
102 |
| SFT | 24.34% | 42.92% |
|
103 |
+
| Masked Thought | 24.18% | *43.60%* |
|
104 |
| **ReMask** | **27.90%** | 43.26% |
|
105 |
|
106 |
As I expected, it improves GSM8K doesn't do much to ARC.
|