VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 5 days ago • 21
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 5 days ago • 21
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 22 days ago • 75
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Paper • 2502.04976 • Published Feb 7
NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations Paper • 2501.17261 • Published Aug 22, 2024
A Survey on Benchmarks of Multimodal Large Language Models Paper • 2408.08632 • Published Aug 16, 2024 • 2