arxiv:2411.08147

Large Language Models Can Self-Improve in Long-context Reasoning

Published on Nov 12

· Submitted by

Siheng99 on Nov 14

#1 Paper of the day

Upvote

Authors:

Siheng Li ,

Zesen Cheng ,

Lemao Liu ,

Yujiu Yang ,

Abstract

Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to self-improve in long-context reasoning and propose \ours, an approach specifically designed for this purpose. This approach is straightforward: we sample multiple outputs for each question, score them with Minimum Bayes Risk, and then apply supervised fine-tuning or preference optimization based on these outputs. Extensive experiments on several leading LLMs demonstrate the effectiveness of \ours, with an absolute improvement of 4.2 points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models. We anticipate that this work will open new avenues for self-improvement techniques in long-context scenarios, which are essential for the continual advancement of LLMs.

View arXiv page View PDF Add to collection

Community

Siheng99

Paper author Paper submitter 2 days ago

Large Language Models Can Self-Improve in Long-context Reasoning

✨
1️⃣ We examines the unexplored potential of LLMs for long-context reasoning by analyzing diverse prompting techniques and expanding generation spaces.
2️⃣ We propose a novel method, SeaLong, designed to facilitate self-improvement of LLMs in long-context reasoning.
3️⃣ Extensive experiments across five tasks demonstrate the effectiveness of SeaLong, underscoring the potential of self-improvement in advancing LLMs.