Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 5 days ago • 50
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis Paper • 2605.18451 • Published 7 days ago • 40
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 11 days ago • 81
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published 11 days ago • 39
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 13 days ago • 186
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published Apr 1 • 38
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE Paper • 2602.08961 • Published Feb 9 • 5
WorldCompass: Reinforcement Learning for Long-Horizon World Models Paper • 2602.09022 • Published Feb 9 • 21
SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation Paper • 2602.02402 • Published Feb 2 • 32
PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction Paper • 2601.22046 • Published Jan 29 • 21
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation Paper • 2601.05241 • Published Jan 8 • 24
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published Dec 2, 2025 • 37
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views Paper • 2505.23716 • Published May 29, 2025 • 31