Update README.md
Browse files
README.md
CHANGED
@@ -111,7 +111,7 @@ As I expected, it improves GSM8K, but doesn't do much to ARC.
|
|
111 |
- Training sequence length: 256
|
112 |
- Input masking probability: 40%
|
113 |
- Label masking probability: 10%
|
114 |
-
- Answer-only (full rationale masking) probability: 10%
|
115 |
- Batch size: 16, accumulated to 256
|
116 |
- Epochs: 6
|
117 |
- Learning rate: 1e-5
|
|
|
111 |
- Training sequence length: 256
|
112 |
- Input masking probability: 40%
|
113 |
- Label masking probability: 10%
|
114 |
+
- Answer-only (full rationale label masking) probability: 10%
|
115 |
- Batch size: 16, accumulated to 256
|
116 |
- Epochs: 6
|
117 |
- Learning rate: 1e-5
|