Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.06559

LLM Reasoning Papers

Papers to improve reasoning capabilities of LLMs

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Paper • 2408.07199 • Published Aug 13 • 21
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 75
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17 • 19
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 10

Pending Classification

Video Creation by Demonstration

Paper • 2412.09551 • Published 12 days ago • 8
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 15 days ago • 45
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published 16 days ago • 71
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published 18 days ago • 38

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published 22 days ago • 28
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published 16 days ago • 68
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published Oct 1 • 34
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Paper • 2411.16579 • Published 30 days ago • 1

Reinforcement Learning: An Overview

Paper • 2412.05265 • Published 18 days ago • 4
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published 16 days ago • 68

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published 28 days ago • 18
VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published 25 days ago • 9
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published 21 days ago • 22
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published 21 days ago • 15

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Paper • 2410.13639 • Published Oct 17 • 16
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24 • 40
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published 21 days ago • 15
Free Process Rewards without Process Labels

Paper • 2412.01981 • Published 22 days ago • 28

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs