view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 85
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 137
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 58
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published Jan 21 • 11