Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 135
Jamba 1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Mar 6 • 86
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 612
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated Feb 24 • 189
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12, 2024 • 37
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Paper • 2308.13137 • Published Aug 25, 2023 • 18
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Paper • 2210.17323 • Published Oct 31, 2022 • 8
LoRA: Low-Rank Adaptation of Large Language Models Paper • 2106.09685 • Published Jun 17, 2021 • 37
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning Paper • 2308.03526 • Published Aug 7, 2023 • 26
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 19