GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published about 13 hours ago • 6
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 1 day ago • 54
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 7 days ago • 25
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 7 days ago • 56
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 14 days ago • 171
Five A^{+} Network: You Only Need 9K Parameters for Underwater Image Enhancement Paper • 2305.08824 • Published May 15, 2023 • 2
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 15 days ago • 61
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 21 days ago • 32
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 22 days ago • 93
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 339
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 30
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 129
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 80
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 31
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 19