Video-Text-to-Text
Transformers
Safetensors
English
llava
text-generation
multimodal
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
93ad570
1 Parent(s): d10c7bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -127,7 +127,7 @@ model-index:
127
 
128
  ## Model Summary
129
 
130
- The LLaVA-OneVision models are 7/72B parameter models trained on [LLaVA-NeXT-Video-SFT](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
131
 
132
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
133
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
@@ -138,7 +138,7 @@ The LLaVA-OneVision models are 7/72B parameter models trained on [LLaVA-NeXT-Vid
138
 
139
  ### Intended use
140
 
141
- The model was trained on [LLaVA-NeXT-Video-SFT](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and have the ability to interact with images, multi-image and videos, but specific to videos.
142
 
143
  **Feel free to share your generations in the Community tab!**
144
 
 
127
 
128
  ## Model Summary
129
 
130
+ The LLaVA-OneVision models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
131
 
132
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
133
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
 
138
 
139
  ### Intended use
140
 
141
+ The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and have the ability to interact with images, multi-image and videos, but specific to videos.
142
 
143
  **Feel free to share your generations in the Community tab!**
144