Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published 2 days ago • 21 • 2
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published 1 day ago • 28 • 2
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published 1 day ago • 10 • 3
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis Paper • 2410.01804 • Published 2 days ago • 3 • 2
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published 2 days ago • 11 • 2
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published 3 days ago • 7 • 3
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published 3 days ago • 14 • 5
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 4 days ago • 27 • 3
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models Paper • 2410.00231 • Published 4 days ago • 5 • 2
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published 4 days ago • 8 • 2
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 7 days ago • 20 • 2
Pixel-Space Post-Training of Latent Diffusion Models Paper • 2409.17565 • Published 9 days ago • 18 • 2
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 8 days ago • 32 • 2
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 8 days ago • 23 • 2
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Paper • 2409.18121 • Published 8 days ago • 7 • 2
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 8 days ago • 33 • 5
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 9 days ago • 43 • 3
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 9 days ago • 22 • 4
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published 9 days ago • 8 • 2
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published 25 days ago • 9 • 2
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 9 days ago • 91 • 4
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published 9 days ago • 11 • 2
TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans Paper • 2409.16666 • Published 10 days ago • 5 • 2
Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing Paper • 2409.16629 • Published 10 days ago • 9 • 2
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published 11 days ago • 10 • 2
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published 10 days ago • 14 • 2
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 11 days ago • 30 • 2
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Paper • 2409.16283 • Published 10 days ago • 6 • 2
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors Paper • 2409.15273 • Published 11 days ago • 10 • 2
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 12 days ago • 27 • 2
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting Paper • 2409.14393 • Published 13 days ago • 7 • 2
Self-Supervised Audio-Visual Soundscape Stylization Paper • 2409.14340 • Published 13 days ago • 2 • 2
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 14 days ago • 32 • 4
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 15 days ago • 65 • 5
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published 14 days ago • 12 • 3
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians Paper • 2409.13648 • Published 14 days ago • 9 • 2
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published 14 days ago • 15 • 2
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 15 days ago • 127 • 9
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 15 days ago • 23 • 2
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published 15 days ago • 20 • 7
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 15 days ago • 17 • 2
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published 15 days ago • 5 • 2
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published 16 days ago • 14 • 2
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 16 days ago • 46 • 4
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published 16 days ago • 9 • 3
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published 16 days ago • 5 • 2
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 16 days ago • 35 • 3