InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper β’ 2504.10479 β’ Published 9 days ago β’ 239
indic-evals Collection Translated versions of popular LLM benchmarks. β’ 4 items β’ Updated Oct 23, 2024 β’ 5
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. β’ 12 items β’ Updated 12 days ago β’ 76
SANA-Sprint Collection πSANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation β’ 6 items β’ Updated 6 days ago β’ 35
SuperBPE Collection SuperBPE tokenizers and models trained with them β’ 8 items β’ Updated 13 days ago β’ 14
SANA-1.5 Collection SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer β’ 6 items β’ Updated 6 days ago β’ 4
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published Feb 4 β’ 226
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper β’ 2503.10460 β’ Published Mar 13 β’ 28
EuroBERT Collection Scaling Multilingual Encoders for European Languages β’ 4 items β’ Updated Mar 10 β’ 11
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing Paper β’ 2502.14458 β’ Published Feb 20 β’ 2
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper β’ 2502.20545 β’ Published Feb 27 β’ 22