MoCha: Towards Movie-Grade Talking Character Synthesis Paper β’ 2503.23307 β’ Published 24 days ago β’ 128
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper β’ 2503.23461 β’ Published 23 days ago β’ 94
FLUX.1 Collection A collection of our FLUX.1 models and LoRAs. β’ 8 items β’ Updated 8 days ago β’ 65
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper β’ 2502.19634 β’ Published Feb 26 β’ 63
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 143
Running on Zero 1.95k 1.95k Chat With Janus-Pro-7B π A unified multimodal understanding and generation model.