failed when loading model
Loading checkpoint shards: 0%| | 0/8 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/workspace/testqa/testqa_with_bluelm_7b_chat_32k.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/opt/miniconda3/envs/willy/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/opt/miniconda3/envs/willy/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3180, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/miniconda3/envs/willy/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3568, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/miniconda3/envs/willy/lib/python3.10/site-packages/transformers/modeling_utils.py", line 745, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/opt/miniconda3/envs/willy/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([100096, 4096]) in "weight" (which has shape torch.Size([100004, 4096])), this look incorrect.
Can you provide your environment?
我遇到过类似的问题,你是不是模型没有直接加载到gpu上,而是先加载到cpu然后再加载到gpu?貌似这个模型在这种方式下会有bug
We have fixed this bug.