Training code?

#3
by flashvenom - opened

If you don't mind sharing, what was the code used to train the model? Both from the dataset and to increase context length -- for context length have you tested how well it works post 2k tokens?

Responded to your other issue - no I goofed and didn't fully test. Contexts up to about 2200-2300 seem to work, but yeah total fail on longer ones.

jondurbin changed discussion status to closed

Sign up or log in to comment