Video-Text-to-Text
Transformers
Safetensors
English
llava
text-generation
multimodal
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
70e353f
1 Parent(s): cc1e179

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -130,7 +130,9 @@ base_model:
130
 
131
  ## Model Summary
132
 
133
- The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
 
 
134
 
135
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
136
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
@@ -141,7 +143,9 @@ The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Vi
141
 
142
  ### Intended use
143
 
144
- The model was trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and have the ability to interact with images, multi-image and videos, but specific to videos.
 
 
145
 
146
  **Feel free to share your generations in the Community tab!**
147
 
 
130
 
131
  ## Model Summary
132
 
133
+ The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
134
+
135
+ This model support at most 64 frames.
136
 
137
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
138
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
 
143
 
144
  ### Intended use
145
 
146
+ The model was trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having the ability to interact with images, multi-image and videos, but specific to videos.
147
+
148
+
149
 
150
  **Feel free to share your generations in the Community tab!**
151