Update vocab size
Per https://huggingface.co/bigscience/bloom-560m/blob/main/config.json, vocab size is 250880 not 250680.
I can't figure out how to update my PR in this interface, but perhaps there should be a note somewhere indicating the padding is 200 and actual vocab size is 250680. The model config.json says the vocab size is 250,880 but the card says 250,680, which is confusing to newcomers to BLOOM because 256,901,120 embedding parameters / 1024 embedding dim = 250,880, not 250,680.
Okay so 250,880 is the dimension in the embedding matrix. However the tokenizer only generates 250680 different tokens. I think the config.json sets the value to 250880 as the embedding matrix had that number of rows.
@mathemakitten
You can click on the PR label/button thingy and it will take you to that branch so you can update the PR in the GUI
(or using git command line if that is an option @mathemakitten )