Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 15 days ago • 11 • 6
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 15 days ago • 11
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 15 days ago • 11 • 6
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 11 days ago • 120 • 10
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 21 days ago • 82 • 5
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 14 days ago • 103
Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published 19 days ago • 14
Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published 19 days ago • 14 • 5
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 23 • 7
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 23 • 7
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published Feb 27 • 3 • 4
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published Feb 27 • 3
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 23 • 7
IHEval: Evaluating Language Models on Following the Instruction Hierarchy Paper • 2502.08745 • Published Feb 12 • 19
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 35
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 155