Failed to convert weights into HuggingFace format

#11
by hrezaei - opened

I manged to convert format of 7B_200B_1 into transformers format, but for 7B_200B_4 an error like this occures:
KeyError: 'layers.29.attention.wq.weight'

The command I run is like:

python /src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ~/multi-token-prediction/7B_200B_4 --model_size 7B --output_dir ~/llama-multi-token/7B_200B_4

And the stack trace is like this:

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Saving a LlamaTokenizerFast to ~/llama-multi-token/7B_200B_4.
1 32 32 4096
Fetching all parameters from the checkpoint at ~/multi-token-prediction/7B_200B_4.
Traceback (most recent call last):
  File "src/transformers/models/llama/convert_llama_weights_to_hf.py", line 415, in <module>
    main()
  File "src/transformers/models/llama/convert_llama_weights_to_hf.py", line 403, in main
    write_model(
  File "src/transformers/models/llama/convert_llama_weights_to_hf.py", line 175, in write_model
    loaded[f"layers.{layer_i}.attention.wq.weight"], n_heads=n_heads
KeyError: 'layers.29.attention.wq.weight'

Any workaround would be appreciated.

Sign up or log in to comment