Neural Vocoder is All You Need for Speech Super-resolution Paper • 2203.14941 • Published Mar 28, 2022 • 1
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 12 days ago • 341
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 18 days ago • 22
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 26 days ago • 568
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. • 5 items • Updated Oct 31, 2023 • 10
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Paper • 2402.01831 • Published Feb 2, 2024 • 15
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 208
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 108
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 6 days ago • 40
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6, 2024 • 14
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8, 2024 • 44