euclaise commited on
Commit
f742604
·
verified ·
1 Parent(s): f58fb74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -17,7 +17,7 @@ Memphis-CoT is a finetune of [StableLM 3b 4e1t](stabilityai/stablelm-3b-4e1t) on
17
  Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
18
 
19
 
20
- ### Training Procedure
21
  I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
22
 
23
  First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
@@ -28,7 +28,7 @@ I then performed the following steps 3 times:
28
 
29
  This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
30
 
31
- ### Hyperparameters
32
 
33
  For the initial supervised finetuning step:
34
  - Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
 
17
  Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
18
 
19
 
20
+ ## Training Procedure
21
  I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
22
 
23
  First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
 
28
 
29
  This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
30
 
31
+ ## Hyperparameters
32
 
33
  For the initial supervised finetuning step:
34
  - Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified