May be a bug
#24
by
DSY001
- opened
I think that shoule be:
past_length = past_key_values[0][0].size(-3)
because the past_key_values[0][0].shape is [bz, seq_len, num_layer, dim].
I think that shoule be:
past_length = past_key_values[0][0].size(-3)
because the past_key_values[0][0].shape is [bz, seq_len, num_layer, dim].