Interpreting Language Model Preferences Through the Lens of Decision Trees

RewardBench Leaderboard (Jan 2025)

Rank Model Base Model Method Overall Score Chat Chat Hard Safety Reasoning
1 Decision-Tree-Reward-Gemma-2-27B Gemma-2-27B Decision Tree 95.4 96.9 91.4 93.9 99.2
2 INF-QRM-Llama3.1-70B Llama-3.1-70B Sequence Classifier 95.1 96.6 91.0 93.6 99.1
3 Decision-Tree-Reward-Llama-3.1-8B Llama-3.1-8B Decision Tree 94.5 96.6 89.5 93.2 98.6
4 QRM-Gemma-2-27B Gemma-2-27B Sequence Classifier 94.4 96.6 90.1 92.7 98.3
5 Skywork-Reward-Gemma-2-27B-v0.2 Gemma-2-27B Sequence Classifier 94.3 96.1 89.9 93.0 98.1
6 Llama-3.1-Nemotron-70B-Reward Llama-3.1-70B Custom Classifier 94.1 97.5 85.7 95.1 98.1
7 Skywork-Reward-Gemma-2-27B Gemma-2-27B Sequence Classifier 93.8 95.8 91.4 91.9 96.1
8 TextEval-Llama3.1-70B Llama-3.1-70B Generative 93.5 94.1 90.1 93.2 96.4
9 MetaMetrics-RM-v1.0 - Custom Classifier 93.4 98.3 86.4 90.8 98.2
10 Skywork-Critic-Llama-3.1-70B Llama-3.1-70B Generative 93.3 96.6 87.9 93.1 95.5
11 QRM-Llama3.1-8B-v2 Llama-3.1-8B Sequence Classifier 93.1 96.4 86.8 92.6 96.8
12 Skywork-Reward-Llama-3.1-8B-v0.2 Llama-3.1-8B Sequence Classifier 93.1 94.7 88.4 92.7 96.7

Usage Code

Before using the model, ensure you have the following dependencies installed:

  • transformers==4.45.2
  • torch>=2.5.0
  • flash-attn>=2.6.3

Note: This code requires a GPU with NVIDIA Ampere architecture or newer.

from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_name = "Decision-Tree-Reward-Llama-3.1-8B" # Another choice is "Decision-Tree-Reward-Gemma-2-27B" 
repo_id = f"RLHFlow/{model_name}"
device = "cuda"
# Initialize the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
# Load the decision tree
model.load_decision_tree(repo_id, filename="decision_tree.pkl")

# Prompt and response pairs
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."

# Compare the two responses
output = model.compare(prompt, response1, response2, tokenizer, device)
print("Response 1 rewards")
print(dict(zip(output["attributes"], output["rewards"][0])))
# {'helpfulness': 3.9603815, 'correctness': 3.9727726, 'coherence': 3.8582935, 'complexity': 0.9909791, 'verbosity': 1.4901903}
print("Response 2 rewards")
print(dict(zip(output["attributes"], output["rewards"][1])))
# {'helpfulness': 2.1698856, 'correctness': 2.2035594, 'coherence': 3.2032843, 'complexity': 0.8786768, 'verbosity': 1.4569137}
print("Model preference")
print(output["preference"])
# 0

License

Note: This model is finetuned from a Skywork model under the following license agreement:

The community usage of Skywork model requires Skywork Community License. The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within Skywork Community License.

To-Do

  • Reward Model Usage code
  • Architecture diagram
Downloads last month
16
Safetensors
Model size
7.5B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Dataset used to train RLHFlow/Decision-Tree-Reward-Llama-3.1-8B