Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 5 days ago • 129
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated about 15 hours ago • 42
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated 15 days ago • 49
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 17 days ago • 187
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • 23 days ago • 30
most ducked models 🦆🦆🦆 Collection https://x.com/jeremyphoward/status/1881264223646576786 • 5 items • Updated Jan 20 • 3