Context windows is only 8k???
#1
by
rombodawg
- opened
This is from the original Gemma Config *kwargs, the segment_size is what scales attention. It splits the attention into recurrent segment_size chunks per https://arxiv.org/abs/2404.07143 π