Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 7 days ago • 16
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 7 days ago • 16 • 2
TIM: A Time Interval Machine for Audio-Visual Action Recognition Paper • 2404.05559 • Published Apr 8, 2024
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 7 days ago • 16