AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Paper
•
2410.03051
•
Published
•
3
Efficient, Performant Video Detailed Captioning and a New Benchmark
Note The VDC benchmark contains 1,027 videos with captions averaging over 500 words.
Note VDC benchmark in lmms-eval format.
Note over 20M image and video data collection for AuroraCap training with vicuna and llama-3 pre-tokenize.
Note video data recaptioned by AuroraCap.