Reasoning - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

LM Prompt Engineering

LM Capabilities and Scaling

LM Architectures

Code Generation

EXL2 Quantized Models

Reasoning

updated about 12 hours ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published about 1 month ago • 37
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published about 1 month ago • 45
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published 25 days ago • 35
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 45
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 31
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Paper • 2412.17498 • Published Dec 23, 2024 • 21
Outcome-Refining Process Supervision for Code Generation

Paper • 2412.15118 • Published Dec 19, 2024 • 19
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 57
MALT: Improving Reasoning with Multi-Agent LLM Training

Paper • 2412.01928 • Published Dec 2, 2024 • 40
Mars-PO: Multi-Agent Reasoning System Preference Optimization

Paper • 2411.19039 • Published Nov 28, 2024 • 1
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published Oct 29, 2024 • 17
o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published Nov 29, 2024 • 43
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 58
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper • 2410.09671 • Published Oct 12, 2024 • 1
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation

Paper • 2411.11053 • Published Nov 17, 2024 • 3
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Paper • 2411.18478 • Published Nov 27, 2024 • 34
Reverse Thinking Makes LLMs Stronger Reasoners

Paper • 2411.19865 • Published Nov 29, 2024 • 20
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Paper • 2411.16579 • Published Nov 25, 2024 • 2
Vision-Language Models Can Self-Improve Reasoning via Reflection

Paper • 2411.00855 • Published Oct 30, 2024 • 5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 33
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 34
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 42
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Paper • 2411.14794 • Published Nov 22, 2024 • 13
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 72
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 54
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113
Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12, 2024 • 63
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 17
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 16 days ago • 245
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published 16 days ago • 50
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published 16 days ago • 89
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published 18 days ago • 37
Test-time Computing: from System-1 Thinking to System-2 Thinking

Paper • 2501.02497 • Published 19 days ago • 41
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published 21 days ago • 31
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 25 days ago • 36
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published 15 days ago • 79
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 11 days ago • 85
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

Paper • 2501.06458 • Published 13 days ago • 29
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 14 days ago • 59
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published 8 days ago • 45
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published 8 days ago • 35
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 1 day ago • 132
Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 2 days ago • 41