The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 606
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21, 2024 • 46
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16, 2024 • 36
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated 4 days ago • 32
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning Paper • 2312.11461 • Published Dec 18, 2023 • 18
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 41
Juanako Top Models Collection These are the Juanako 7B Trained with SFT & DDP & UNA • 8 items • Updated Nov 23, 2024 • 4