RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

#24

by saireddy - opened Jul 11, 2024

Jul 11, 2024

•

edited Jul 11, 2024

Usecase : I am trying to fine tune gemma2 using SFTTrainer and here is how I am loading the model and my bnb cofigs
model_params = {
"attn_implementation": "eager",
"torch_dtype": torch.bfloat16,
"use_cache": True,
"device_map": "auto",
}
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, **model_params)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
BNB_CONFIG = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

Training arguments

TRAINING_ARGS = {
"num_train_epochs": 1,
"optim": "adamw_torch_fused",
"logging_steps": 20,
"save_strategy": "epoch",
"bf16": True,
"tf32": True,
}

and when i try to use fine tuned model to generate predictions using this
outputs = model.generate(input_ids=input_ids, max_new_tokens=max_new_tokens, do_sample=True,
temperature=temperature, pad_token_id=tokenizer.eos_token_id)

i am hitting this error, and the same script works fine with llama3, mistral, qwen ...
RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

stacktrace :

outputs = model.generate(input_ids=input_ids, max_new_tokens=max_new_tokens, do_sample=True,
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1491, in generate
outputs = self.base_model.generate(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1914, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2651, in _sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 1068, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 908, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 650, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 252, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py", line 1071, in update
return update_fn(
File "/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py", line 1046, in _static_update
k_out[:, :, cache_position] = key_states
RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

Hardware : NVIDIA H100 80GB
accelerate==0.31.0
bitsandbytes==0.43.1
datasets==2.18.0
deepspeed==0.14.4
evaluate==0.4.1
peft==0.11.1
transformers==4.42.3
trl==0.9.4
pytorch image : nvcr.io/nvidia/pytorch:24.05-py3 -- cuda 12.4.1 and torch 2.4

saireddy

Jul 17, 2024

@Renu11 any advise on this issue?

DeHors

Aug 12, 2024

Do you know how to fix this bug?

saireddy

Aug 12, 2024

@DeHors i was able to fix this issue using

model.to(torch.bfloat16)

before generating predictions

DeHors

Aug 13, 2024

But when i use model.to(torch.bfloat16) before generating predictions, I find this bug:
ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

saireddy

Aug 13, 2024

@DeHors it worked for me as i was doing full finetuning and I assuming you are using lora or qlora for peft. I am not sure on how to fix for this one. sorry

saeed899

Oct 7, 2024

•

edited Oct 7, 2024

+1
getting similar Error:
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.

saireddy

Oct 8, 2024

This comment has been hidden

saireddy changed discussion status to closed Oct 8, 2024

saireddy changed discussion status to open Oct 8, 2024

saireddy

Oct 8, 2024

sorry i did type same answer as above , so had to hide my above comment

BirdWithDreams

Oct 23, 2024

Try to set use_cache to False; it has helped me.

amurienne

Oct 30, 2024

+1
getting similar Error:
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.

Same issue for me here, using gemma2-2b and 4-bit quant.
Setting use_cache to False didn't solve the error, but setting model.to(torch.half) seems to correct it.

yokey

Nov 13, 2024

This comment has been hidden

brando

26 days ago

brando

26 days ago

model.to(torch.half)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment