Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 14 days ago • 441
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated Mar 3 • 114
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 152
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15, 2024 • 31
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 44
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 184
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision Paper • 2312.16256 • Published Dec 26, 2023 • 17
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar Paper • 2312.14239 • Published Dec 21, 2023 • 12
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 55