olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 3 items • Updated 1 day ago • 44
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 8 days ago • 49
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 15 days ago • 30
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 17 days ago • 49
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated 22 days ago • 50
Reasoning Datasets Collection Distilled synthetic Reasoning datasets • 7 items • Updated 26 days ago • 55
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 4 days ago • 379
view article Article Introducing the Synthetic Data Generator - Build Datasets with Natural Language Dec 16, 2024 • 109
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 150
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated about 2 hours ago • 81
InternVL2.5-MPO Collection Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization • 16 items • Updated 30 days ago • 26