File size: 255 Bytes
43e792c
 
2300c62
43e792c
2300c62
 
1
2
3
4
5
6
---
license: mit
pipeline_tag: video-text-to-text
---

This repository contains the model described in [VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation](https://huggingface.co/papers/2412.00927).