POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published 14 days ago • 38
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 19 days ago • 46
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Paper • 2307.01097 • Published Jul 3, 2023 • 10
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20 • 17
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping Paper • 2403.15951 • Published Mar 23 • 1
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 38
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 38
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20 • 17
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping Paper • 2403.15951 • Published Mar 23 • 1
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Paper • 2307.01097 • Published Jul 3, 2023 • 10