Video-Text-to-Text
Transformers
Safetensors
English
llava
text-generation
multimodal
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
c0a6bab
1 Parent(s): e6807e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -135,7 +135,7 @@ The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](
135
  This model support at most 64 frames.
136
 
137
  - **Project Page:** [Project Page](https://llava-vl.github.io/blog/2024-09-30-llava-video/).
138
- - **Paper**: For more details, please check our [paper](arxiv.org/abs/2410.02713)
139
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
140
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
141
  - **Languages:** English, Chinese
 
135
  This model support at most 64 frames.
136
 
137
  - **Project Page:** [Project Page](https://llava-vl.github.io/blog/2024-09-30-llava-video/).
138
+ - **Paper**: For more details, please check our [paper](https://arxiv.org/abs/2410.02713)
139
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
140
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
141
  - **Languages:** English, Chinese