Update safetensors to have embedding layer

#7

Fixes https://github.com/huggingface/transformers/issues/34759

Proposed solution :
The safetensors file had the embedding layer missing.
I loaded the model from the existing weights file and saved it as safetensors

You can test the functionality of the updated safetensors with the following script

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-125M", use_fast=False)

mobilellm_old = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)
mobilellm = AutoModelForCausalLM.from_pretrained("/Users/mayankagarwal/Documents/OSS/codebases/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)

input = tokenizer("Hello word!", return_tensors="pt")


output_old = mobilellm_old.generate(**input)
decoded = tokenizer.decode(output_old[0], skip_special_tokens=True)
print("Old decoded output:", decoded)


output = mobilellm.generate(**input)
decoded = tokenizer.decode(output[0], skip_special_tokens=True)
print("Updated decoded output:", decoded)

Here's a screenshot of the output

image.png

mayankagarwals changed pull request status to open

@zechunliu Please do take a look!

AI at Meta org

Thank you so much for raising this issue! It's a pity I just noticed this. I have removed the safetensors, and it should work now. The original pytorch_model.bin is correct. Let me know if you spot any other issues!

zechunliu changed pull request status to closed

Hey @zechunliu , the safe tensors file actually did contain the embedding params, they were just named lm_head. To my knowledge safe tensors format doesn't like weight-sharing, so the base model which references this matrix as both lm_head and embed_tokens was arbitrarily choosing to drop embed_tokens.

Would love to have the original safetensors variant still available, either with a small code change in model loading to re-tie embeddings on loading, or just renaming the safetensors model to something like model_no_embed.safetensors, I have some training pipelines that rely on it and am very grateful for you releasing these weights in the first place!

(on a slightly unrelated note - are these models trained in BF16 or FP16? It seems these tensors are in fp16, while the larger models are in bf16. I can't quite tell from your GitHub repo nor paper which was intended either!)

Sign up or log in to comment