Update README.md
Browse files
README.md
CHANGED
@@ -81,7 +81,7 @@ The model was trained on a subset of [FineWeb-edu](https://huggingface.co/datase
|
|
81 |
- Activations quantized to 8-bit precision
|
82 |
|
83 |
10. **Key Findings**
|
84 |
-
- Warmup quantization (linear lambda scheduler) proved crucial for performance
|
85 |
|
86 |
These 10B token training runs showed that it's possible to effectively fine-tune pre-trained models to 1.58-bit precision, achieving strong performance with relatively limited additional training data.
|
87 |
|
|
|
81 |
- Activations quantized to 8-bit precision
|
82 |
|
83 |
10. **Key Findings**
|
84 |
+
- Warmup quantization (sigmoid or linear lambda scheduler) proved crucial for performance
|
85 |
|
86 |
These 10B token training runs showed that it's possible to effectively fine-tune pre-trained models to 1.58-bit precision, achieving strong performance with relatively limited additional training data.
|
87 |
|