VISTA-LongVA / README.md
nielsr's picture
nielsr HF staff
Add pipeline tag, add link to paper
2300c62 verified
|
raw
history blame
255 Bytes
metadata
license: mit
pipeline_tag: video-text-to-text

This repository contains the model described in VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation.