Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 77
Running on CPU Upgrade Agents Featured 1.36k Open ASR Leaderboard 🏆 1.36k Compare speech‑to‑text models across multiple benchmarks
Running on Zero Agents 314 Llasa 3b Tts 🔥 314 Zero Shot voice cloning with llasa 3b (Unofficial Demo)
Running Agents Featured 2.11k Wan2.1 💻 2.11k Wan: Open and Advanced Large-Scale Video Generative Models
Running Agents 354 VBench Leaderboard 📊 354 Submit video model evaluation results to a public benchmark