VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 10 days ago • 46
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published 24 days ago • 18
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers Paper • 2412.09611 • Published Dec 12, 2024 • 9
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published Dec 10, 2024 • 20
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Paper • 2412.05148 • Published Dec 6, 2024 • 11
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 26
Mind the Time: Temporally-Controlled Multi-Event Video Generation Paper • 2412.05263 • Published Dec 6, 2024 • 10
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 69
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Paper • 2411.15411 • Published Nov 23, 2024 • 7
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE Paper • 2411.16856 • Published Nov 25, 2024 • 11
TEXGen: a Generative Diffusion Model for Mesh Textures Paper • 2411.14740 • Published Nov 22, 2024 • 15
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 35
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 42