Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published 9 days ago • 37
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published 30 days ago • 136
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6, 2024 • 27