Foundation Text-Generation Models Below 360M Parameters Collection Great candidates for fine-tuning targeting Wllama and Transformers.js for mobile devices, ordered by number of parameters. • 31 items • Updated 2 days ago • 24
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 8 items • Updated 10 days ago • 51
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 7 days ago • 91
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated 10 days ago • 28
Hamanasu Collection A brand new series of Models from yours truly, Designed for Intelligence, Creativity and Roleplay. • 9 items • Updated 12 days ago • 4
OLMoE (January 2025) Collection Improved OLMoE for iOS app. Read more: https://allenai.org/blog/olmoe-app • 10 items • Updated 16 days ago • 9
SFTvsRL Models & Data Collection This collection contains 4 initial checkpoints for https://github.com/LeslieTrue/SFTvsRL and necessary data for V-IRL training. • 5 items • Updated 22 days ago • 8
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 29 days ago • 108
DeepSeek R1 (All Versions) Collection DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated about 6 hours ago • 201
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 3 days ago • 370
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated about 22 hours ago • 102
Dolphin 3.0 Collection Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. • 9 items • Updated 20 days ago • 95