Running on Zero 80 80 Chat with Kimi-VL-A3B-Thinking 🤔 Chat with Kimi-VL-A3B-Thinking using text and images
Running 256 256 Qwen2.5 Omni 7B Demo 🏆 Generate text and speech responses from text, images, or audio input
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14 • 94