OpenGVLab/Mono-InternVL-2B · [Bug] RuntimeError when running example code with float16

When I ran the example code with torch.float16, it raised a RuntimeError as below:
/opt/conda/envs/vllm/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Warning: Flash attention is not available, using eager attention instead.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.98s/it]
/opt/conda/envs/vllm/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
User: Hello, who are you?
Assistant: I am an AI assistant that specializes in answering any question you have. How can I help you today?
User: Can you tell me a story?
Assistant: Sure, I'd be happy to share a story with you. What kind of story do you enjoy?
Traceback (most recent call last):
  File "/kaggle/working/mono-internvl.py", line 107, in <module>
    response = model.chat(tokenizer, pixel_values, question, generation_config)
  File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mono-InternVL-2B/c5f9f4332ae9883fee9cf86968c18601d784d065/modeling_internvl_chat.py", line 393, in chat
    generation_output = self.generate(
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mono-InternVL-2B/c5f9f4332ae9883fee9cf86968c18601d784d065/modeling_internvl_chat.py", line 447, in generate
    outputs = self.language_model.generate(
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1525, in generate
    return self.sample(
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2658, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
However, torch.bfloat16 works well and this error only encountered when chatting with image.