InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 2 days ago • 204
DataDecide Collection A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 358 items • Updated about 10 hours ago • 8
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published 6 days ago • 23
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 7 days ago • 66
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 6 days ago • 74
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 15 days ago • 73
view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC 8 days ago • 19
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 8 days ago • 141
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 9 days ago • 59
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 9 days ago • 41
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 5 days ago • 59
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 8 items • Updated 14 days ago • 117
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 9 days ago • 160
view article Article Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More 9 days ago • 15