Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated 12 days ago • 110
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published about 1 month ago • 143
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 204
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 19
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 135
Llama 3.3 Collection This collection hosts the transformers and original repos of the Llama 3.3 • 1 item • Updated Dec 6, 2024 • 144
Common Models Collection The first generation of models pretrained on Common Corpus. • 5 items • Updated Dec 5, 2024 • 30
LLäMmlein Chat Preview 🐑 Collection https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/ • 8 items • Updated Nov 22, 2024 • 11
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published Nov 25, 2024 • 20
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 23 days ago • 246
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227