Papers
arxiv:2410.18451

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Published on Oct 24
· Submitted by chrisliu298 on Oct 25
Authors:
,
,
,
,
,
,

Abstract

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Community

Paper author Paper submitter
edited Oct 25

We have recently updated our Skywork Reward 27B and 8B reward models, along with the dataset, to v0.2. Check out the links below!

Skywork-Reward-Llama-3.1-8B-v0.2: https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
Skywork-Reward-Gemma-2-27B-v0.2: https://huggingface.co/Skywork/Skywork-Reward-Gemma-2-27B-v0.2
Skywork-Reward-Preference-80K-v0.2: https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 3

Spaces citing this paper 1

Collections including this paper 8