Issue with ValueError: Trying to set a tensor of shape torch.Size([4096, 1024]) in RaDialog Inference Code
Hello,
I'm currently running the RaDialog inference code and encountered the following error:
Traceback (most recent call last):
File "/home/hanjh/AIMS_MRG/RaDialog-interactive-radiology-report-generation/RaDialog_LLaVA_inference.py", line 43, in
tokenizer, model, image_processor, context_len = load_model_from_huggingface(repo_id="Chantal/RaDialog-interactive-radiology-report-generation")
File "/home/hanjh/AIMS_MRG/RaDialog-interactive-radiology-report-generation/RaDialog_LLaVA_inference.py", line 26, in load_model_from_huggingface
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_base='liuhaotian/llava-v1.5-7b',
File "/home/hanjh/AIMS_MRG/RaDialog-interactive-radiology-report-generation/LLAVA_Biovil/llava/model/builder.py", line 67, in load_pretrained_model
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
File "/home/hanjh/bin/miniconda3/envs/mrg_stage2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3926, in from_pretrained
) = cls._load_pretrained_model(
File "/home/hanjh/bin/miniconda3/envs/mrg_stage2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4400, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/hanjh/bin/miniconda3/envs/mrg_stage2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 936, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/hanjh/bin/miniconda3/envs/mrg_stage2/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([4096, 1024]) in "weight" (which has shape torch.Size([4096, 512])), this looks incorrect.
It seems that there is a mismatch between the shape of the loaded weights and the model's architecture. The error indicates that a tensor with shape [4096, 1024] is being loaded into a model that expects a tensor of shape [4096, 512].
Could you please advise on how to resolve this issue? Any insights would be greatly appreciated.
Thank you!
Hi,
if I clone this repo and then install the requirements as described here: https://huggingface.co/ChantalPellegrini/RaDialog-interactive-radiology-report-generation
I can successfully load the model and the code.
If you have any more details on the versions of libraries etc you installed and potential changes you made to the code, I can try to compare.