Sao10K commited on
Commit
e7316e6
1 Parent(s): 57bd7a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -37,6 +37,9 @@ Relevant Axolotl Configurations:
37
  <br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
38
  <br>\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.
39
  <br>\- 2M Rope Theta had the best loss results during training compared to other values.
 
 
 
40
 
41
  ```
42
  sequence_len: 8192
 
37
  <br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
38
  <br>\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.
39
  <br>\- 2M Rope Theta had the best loss results during training compared to other values.
40
+ <br>\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.
41
+ <br>\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting. -> Tried at low value mixes, eg. <20% and lower.
42
+ <br>\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.
43
 
44
  ```
45
  sequence_len: 8192