Papers
arxiv:2503.02878

Language Models can Self-Improve at State-Value Estimation for Better Search

Published on Mar 4
· Submitted by emendes3 on Mar 5

Abstract

Collecting ground truth task completion rewards or human demonstrations for multi-step reasoning tasks is often cost-prohibitive and time-consuming, especially in interactive domains like web tasks. To address this bottleneck, we present self-taught lookahead, a self-supervised method that leverages state-transition dynamics to train a value model capable of effectively guiding language model-controlled search. We find that moderately sized (8 billion parameters) open-weight value models improved with self-taught lookahead can match the performance of using a frontier LLM such as gpt-4o as the value model. Furthermore, we find that self-taught lookahead improves performance by 20% while reducing costs 37x compared to previous LLM-based tree search, without relying on ground truth rewards.

Community

Paper author Paper submitter
edited 1 day ago

TLDR:

  • Conventionally, improving language models for search on reasoning tasks (e.g., web agents) often requires human demonstrations or ground truth rewards, which are expensive
  • We propose self-taught-lookahead (STL), a method that can self-improve models on state transitions only by capturing the Bellman update in natural language
  • Specifically, we train the value model used to guide search with our self-supervised approach
  • We find that this leads to a 39% improvement in performance compared to using the base value model, matching the performance of using a GPT-4o value model
  • Search with STL value models is also 37x cheaper than previous search methods and 10x cheaper than using closed source models
  • We also find STL is possible with very small value models (~3B parameters), which approach the performance of GPT-4o

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.02878 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.02878 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.02878 in a Space README.md to link it from this page.

Collections including this paper 2