euclaise commited on
Commit
f58fb74
·
verified ·
1 Parent(s): c55db1a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-3.0
3
+ datasets:
4
+ - euclaise/TinyCoT
5
+ - euclaise/reddit-instruct
6
+ - sablo/oasst2_curated
7
+ library_name: transformers
8
+ tags:
9
+ - supertrainer2000
10
+ ---
11
+
12
+
13
+ Memphis-CoT is a finetune of [StableLM 3b 4e1t](stabilityai/stablelm-3b-4e1t) on [TinyCoT](https://huggingface.co/datasets/euclaise/TinyCoT), along with [reddit-instruct](https://huggingface.co/datasets/euclaise/reddit-instruct) and a [curated](https://huggingface.co/datasets/sablo/oasst2_curated) subset of [oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2).
14
+
15
+ **Memphis was trained *only* on human data! No GPT generations here.**
16
+
17
+ Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
18
+
19
+
20
+ ### Training Procedure
21
+ I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
22
+
23
+ First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
24
+
25
+ I then performed the following steps 3 times:
26
+ 1. Generate responses for each question in TinyCoT using the current model, check each response for correctness, and create a dataset of (correct, incorrect) pairs. Extra values are discarded, such that each correct and incorrect response is unique.
27
+ 2. Finetune the model for 1 epoch using a ranking loss over length-normalized log-probabilities of each sequence, similar to [Preference Ranking Optimization](https://arxiv.org/abs/2306.17492), comparing the correct vs incorrect generated response. A standard CE loss over the ground-truth was included to prevent excessive drift.
28
+
29
+ This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
30
+
31
+ ### Hyperparameters
32
+
33
+ For the initial supervised finetuning step:
34
+ - Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
35
+ - Lambda (Adalite's analogue to weight decay) of 0.01
36
+ - LR of 1e-5
37
+ - MixCE ratio of 0.75
38
+ - Sequence length of 4096
39
+ - Cosine decay with a 20% warmup
40
+ - Frozen embeddings
41
+ - No training on inputs
42
+ - Accumulated batch size of 128
43
+ - NEFTune with an alpha of 10
44
+
45
+ For the generations:
46
+ - Generated using the current git version of `vllm`
47
+ - N=8
48
+ - Temperature of 0.5
49
+ - `top_p` of 0.8
50
+ - Maximum of 512 generated tokens, discarding responses that do not have a valid rationale and answer
51
+
52
+ For the rank finetuning:
53
+ - Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
54
+ - Lambda of 0.01
55
+ - LR of 5e-7
56
+ - Rank loss weight of 5
57
+ - Sequence length of 1024
58
+ - Cosine schedule with 10% warmup
59
+ - Frozen embeddings
60
+ - No training on inputs
61
+ - Accumulated batch size of 128
62
+ - NEFTune with an alpha of 10