Batching token length
When training bloom how many tokens could the input be? 2048?
2048
reference: https://arxiv.org/pdf/2211.05100.pdf
Actually it was trained with sequence length 2048 but the model supports any length, you can try generating more tokens to infinity (with some performance degradation as you increase the length) it's linked to our use of alibi
how much degradation? Are you saying I can put 100000 words in one training example?
thanks
It's specific to your setup.
For more explanation on what ALIBI is: https://arxiv.org/abs/2108.12409
For some plots where you can understand how good it becomes on long sequence, we had a preliminary result (on 1B model) in: https://arxiv.org/abs/2210.15424 (Figure 2)
That's from a modeling perspective. From a pure hardware perspective, longer sequence means more memory footprint, so you might get out of memory issues when using 100_000 words (depending on your setup).