SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 8 days ago • 158
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7 • 55
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper • 2503.23730 • Published 16 days ago • 4
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper • 2503.23730 • Published 16 days ago • 4
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper • 2503.23730 • Published 16 days ago • 4 • 2
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published 26 days ago • 38
mistralai/Mistral-Small-3.1-24B-Instruct-2503 Image-Text-to-Text • Updated 7 days ago • 141k • • 1.12k