LanguageBind/Video-LLaVA-7B-hf · Can not reproduce the same output

I am running the script of the example but I am getting this output:

Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.44.
Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.
USER: Why is this video funny? ASSISTANT: The and? and??????????? [? [ and, [ [ [ [ [ [ [ [ [ [, [, [ and, [, and, and, and, and, and, and, and, and, and, and, and, and,

Would you please help with this?