view post Post 2972 🚨 New VQA + captioning dataset! moondream/megalith-mdqaImages from Megalith, captioned using Moondream, then transformed to short-form QA.9M+ images, 6-10 QA pairs per image. See translation 🔥 8 8 🧠 1 1 ➕ 1 1 🚀 1 1 + Reply
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper • 2501.04689 • Published Jan 8 • 17
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 46
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 71