LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Paper • 2412.15214 • Published 6 days ago • 14
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay Paper • 2412.04449 • Published 20 days ago • 6
FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution Paper • 2410.22655 • Published Oct 30 • 1
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 22
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 22
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 13
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 27
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering Paper • 2312.00109 • Published Nov 30, 2023 • 9
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 22