Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 4 days ago • 1
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 9
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 6 days ago • 33
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 12 days ago • 47
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset 27 days ago • 73
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published 19 days ago • 135
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 23 days ago • 85
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation Paper • 2503.09641 • Published 26 days ago • 31
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated Mar 3 • 113
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 148
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 220