Visual Question Answering
Transformers
English
videollama2_mixtral
text-generation
multimodal large language model
large video-language model
Inference Endpoints
Sicong commited on
Commit
b41dd4a
1 Parent(s): f1f1ca1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -81,7 +81,7 @@ def inference():
81
  # The woman in the image is wearing a black coat and sunglasses, and she is walking down a rain-soaked city street. The image feels vibrant and lively, with the bright city lights reflecting off the wet pavement, creating a visually appealing atmosphere. The woman's presence adds a sense of style and confidence to the scene, as she navigates the bustling urban environment.
82
  modal_list = ['image']
83
  # 1. Initialize the model.
84
- model_path = 'DAMO-NLP-SG/VideoLLaMA2-7B-Base'
85
  model_name = get_model_name_from_path(model_path)
86
  tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name)
87
  model = model.to('cuda:0')
 
81
  # The woman in the image is wearing a black coat and sunglasses, and she is walking down a rain-soaked city street. The image feels vibrant and lively, with the bright city lights reflecting off the wet pavement, creating a visually appealing atmosphere. The woman's presence adds a sense of style and confidence to the scene, as she navigates the bustling urban environment.
82
  modal_list = ['image']
83
  # 1. Initialize the model.
84
+ model_path = 'DAMO-NLP-SG/VideoLLaMA2-8x7B-Base'
85
  model_name = get_model_name_from_path(model_path)
86
  tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name)
87
  model = model.to('cuda:0')