Generating through HuggingFace Transformers leads to RuntimeError: probability tensor contains either `inf`, `nan` or element < 0. Generating through vLLM encounters no issues.
#2
by
paulhager
- opened
When running the code given on the model card to load and generate through huggingface transformers library, I encounter RuntimeError: probability tensor contains either inf
, nan
or element < 0
When loading and serving the model through vLLM using the exact same model shards no errors are encountered.
What could be the problem here? torch_dtype="auto" is set in AutoModelForCausalLM.from_pretrained and trying to also manually do model = model.bfloat16() also has no effect.
Same behavior encountered with the 72B GPTQ Int4 model.
I'm using an A40 GPU