MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published 5 days ago • 24
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 7 days ago • 19
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 8 days ago • 62
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 13 days ago • 26
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 13 days ago • 26
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 14 days ago • 57
google/siglip2-so400m-patch14-384 Zero-Shot Image Classification • Updated 19 days ago • 589k • 13
google/siglip2-so400m-patch16-naflex Zero-Shot Image Classification • Updated 19 days ago • 15.6k • 15
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 20 days ago • 178
Five A^{+} Network: You Only Need 9K Parameters for Underwater Image Enhancement Paper • 2305.08824 • Published May 15, 2023 • 2
Running 2.21k 2.21k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 22 days ago • 65
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 27 days ago • 32
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108