TypeError: pad(): argument 'pad' failed to unpack
#2
by
cahya
- opened
I tried to run the example script on a H100 (I use this model "unsloth/Llama-4-Scout-17B-16E-Instruct"), but I get an error message when I generated the outputs
>>> outputs = model.generate(
... **inputs,
... max_new_tokens=256,
... )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cahya/miniconda3/envs/transformers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/cahya/miniconda3/envs/transformers/lib/python3.11/site-packages/transformers/generation/utils.py", line 2460, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/home/cahya/miniconda3/envs/transformers/lib/python3.11/site-packages/transformers/generation/utils.py", line 3426, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
....
File "/home/cahya/miniconda3/envs/transformers/lib/python3.11/site-packages/transformers/integrations/flex_attention.py", line 103, in make_flex_block_causal_mask
attention_mask_2d = torch.nn.functional.pad(attention_mask_2d, value=0, pad=(0, key_length))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cahya/miniconda3/envs/transformers/lib/python3.11/site-packages/torch/nn/functional.py", line 5209, in pad
return torch._C._nn.pad(input, pad, mode, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: pad(): argument 'pad' failed to unpack the object at pos 2 with error "type must be tuple of ints,but got NoneType"
Does anyone have the same issue?
Try loading with attn_implementation="eager"
instead of attn_implementation="flex_attention"
, i.e.:
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
processor = AutoProcessor.from_pretrained(model_id)
model = Llama4ForConditionalGeneration.from_pretrained(
model_id,
attn_implementation="eager",
device_map="auto",
torch_dtype=torch.bfloat16,
)