Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ Memphis-CoT is a finetune of [StableLM 3b 4e1t](stabilityai/stablelm-3b-4e1t) on
|
|
17 |
Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
|
18 |
|
19 |
|
20 |
-
|
21 |
I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
|
22 |
|
23 |
First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
|
@@ -28,7 +28,7 @@ I then performed the following steps 3 times:
|
|
28 |
|
29 |
This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
|
30 |
|
31 |
-
|
32 |
|
33 |
For the initial supervised finetuning step:
|
34 |
- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
|
|
|
17 |
Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
|
18 |
|
19 |
|
20 |
+
## Training Procedure
|
21 |
I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
|
22 |
|
23 |
First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
|
|
|
28 |
|
29 |
This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
|
30 |
|
31 |
+
## Hyperparameters
|
32 |
|
33 |
For the initial supervised finetuning step:
|
34 |
- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
|