CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 28 days ago • 50
DELTA: Dense Efficient Long-range 3D Tracking for any video Paper • 2410.24211 • Published Oct 31 • 8
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents Paper • 2410.22476 • Published Oct 29 • 25
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28 • 77
Toxicity of the Commons: Curating Open-Source Pre-Training Data Paper • 2410.22587 • Published Oct 29 • 10
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published Oct 30 • 9
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published Oct 28 • 11
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28 • 9
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25 • 19
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation Paper • 2410.20474 • Published Oct 27 • 14
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26 • 23
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Paper • 2410.18666 • Published Oct 24 • 19
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23 • 49
Allegro: Open the Black Box of Commercial-Level Video Generation Model Paper • 2410.15458 • Published Oct 20 • 40