Training code?
#3
by
flashvenom
- opened
If you don't mind sharing, what was the code used to train the model? Both from the dataset and to increase context length -- for context length have you tested how well it works post 2k tokens?
Responded to your other issue - no I goofed and didn't fully test. Contexts up to about 2200-2300 seem to work, but yeah total fail on longer ones.
jondurbin
changed discussion status to
closed