Embedding matrix size

#3
by mrtnm - opened

The following code shows that the embedding matrix is 384 in its first dimension.

from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained('google/byt5-base')
model.get_input_embeddings().weight.shape

torch.Size([384, 1536])

Why is this not 259, one vector for each possible value that one byte can represent, plus the 3 additional tokens mentioned in the paper?

I could not find any mention of this in the paper.

Sign up or log in to comment