QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 44
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 10 days ago • 33
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 10 days ago • 33
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 10 days ago • 33 • 4
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • 18 days ago • 61
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published Oct 10, 2024 • 46
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search Paper • 1907.05737 • Published Jul 12, 2019
Trained Rank Pruning for Efficient Deep Neural Networks Paper • 1812.02402 • Published Dec 6, 2018 • 1