Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 11 days ago • 26
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published 26 days ago • 21
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals Paper • 2503.19953 • Published 27 days ago • 3
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement Paper • 2503.04919 • Published Mar 6 • 8
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 27 days ago • 46
AMD-Hummingbird: Towards an Efficient Text-to-Video Model Paper • 2503.18559 • Published 29 days ago • 5
Cosmos Transfer1 Collection Multimodal Conditional World Generation for World2World Transfer • 5 items • Updated 7 days ago • 14
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published 28 days ago • 18
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published 30 days ago • 19
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Paper • 2503.16422 • Published Mar 20 • 14
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 39
Physical AI Collection Collection of commercial-grade datasets for physical AI developers • 10 items • Updated 7 days ago • 46