metadata
license: mit
pipeline_tag: video-text-to-text
This repository contains the model described in VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation.
license: mit
pipeline_tag: video-text-to-text
This repository contains the model described in VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation.