SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 162
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Paper • 2411.14522 • Published Nov 21, 2024 • 39
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Paper • 2503.22194 • Published 23 days ago • 24
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published 30 days ago • 36
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools Paper • 2503.10970 • Published Mar 14 • 16
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17 • 21
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 20
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents Paper • 2502.16069 • Published Feb 22 • 19
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated Mar 13 • 96
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 22
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 29
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 99
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 56
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning Paper • 2408.07931 • Published Aug 15, 2024 • 22
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 27