OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 6 days ago • 66
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 7 days ago • 97
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 5 days ago • 58
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 14 days ago • 72
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem Paper • 2411.17525 • Published Nov 26, 2024 • 3