MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated 13 days ago • 20
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 8 days ago • 76
Gemma 3 Collection All versions of Google's new multimodal models in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 29 items • Updated 4 days ago • 48
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality about 1 month ago • 71
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated about 1 month ago • 68
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 952
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Feb 26 • 111