An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11, 2024 • 29
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11, 2024 • 31
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11, 2024 • 31
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation Paper • 2403.06775 • Published Mar 11, 2024 • 5
StableDrag: Stable Dragging for Point-based Image Editing Paper • 2403.04437 • Published Mar 7, 2024 • 30
How Far Are We from Intelligent Visual Deductive Reasoning? Paper • 2403.04732 • Published Mar 7, 2024 • 24
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis Paper • 2403.04116 • Published Mar 7, 2024 • 7
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7, 2024 • 42
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper • 2404.02101 • Published Apr 2, 2024 • 25
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29, 2024 • 28
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published May 27, 2024 • 17
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27, 2024 • 12
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models Paper • 2412.18608 • Published Dec 24, 2024 • 18
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Paper • 2412.18072 • Published Dec 24, 2024 • 19
WavePulse: Real-time Content Analytics of Radio Livestreams Paper • 2412.17998 • Published Dec 23, 2024 • 11
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion Paper • 2412.17780 • Published Dec 23, 2024 • 5