metadata

license: apache-2.0
datasets:
  - nicholasKluge/reward-aira-dataset
language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
  - reward model
  - alignment
  - preference model
  - RLHF
widget:
  - text: >-
      Why is AI Ethics important? [SEP] Who cares about AI Ethics? It's just a
      bunch of whining about humans making and using AI and bitching about what
      the machines do.
    example_title: Bad Response
  - text: >-
      Why is AI Ethics important? [SEP] The field of AI Ethics delves deeply
      into the intricate ethical considerations that arise with respect to AI
      systems. This includes the role of humanity in creating and deploying
      these systems, as well as the conduct of machines themselves. Broadly
      speaking, AI Ethics can be divided into two major categories : concerns
      surrounding the morality of human actions in relation to creating and
      using AI, and concerns regarding the moral implications of machine
      behavior.
    example_title: Good Response

RewardModel

The RewardModel is a BERTmodel that can be used to score the quality of a completion for a given prompt.

The model was trained with a dataset composed of prompt, prefered_completions, and rejected_completions.

These prompt + completions are samples of intruction datasets created via the Self-Instruct framework.

Details

Size: 109,038,209 parameters
Dataset: Reward-Aira Dataset
Language: English
Number of Epochs: 5
Batch size: 42
Optimizer: torch.optim.AdamW
Learning Rate: 5e-5
GPU: 1 NVIDIA A100-SXM4-40GB
Emissions: 0.17 KgCO2
Total Energy Consumption: 0.48 kWh

Step	Training Loss	Validation Loss	Accuracy
200	0.080300	0.037106	0.987499
400	0.039300	0.036421	0.988433
600	0.037200	0.041799	0.986447
800	0.011400	0.039411	0.989602
1000	0.013800	0.039781	0.989718
1200	0.012700	0.034337	0.990887
1400	0.005200	0.037403	0.991120
1600	0.001800	0.047661	0.990653
1800	0.000900	0.051354	0.991237
2000	0.001000	0.046224	0.990419
2200	0.000200	0.046582	0.991120
2400	0.000600	0.046632	0.990536
2600	0.000100	0.051437	0.990770
2800	0.000500	0.049085	0.990887
3000	0.000400	0.049938	0.991004

This repository has the notebook used to train this model.

Usage

Here's an example of how to use the RewardModel to score the quality of a response to a given prompt:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/RewardModel")
rewardModel = AutoModelForSequenceClassification.from_pretrained("nicholasKluge/RewardModel")

rewardModel.eval()
rewardModel.to(device)

# Define the question and response
prompt = "Why is AI Ethics important?"
response_good = "The field of AI Ethics delves deeply into the intricate ethical considerations that arise with respect to AI systems. This includes the role of humanity in creating and deploying these systems, as well as the conduct of machines themselves. Broadly speaking, AI Ethics can be divided into two major categories : concerns surrounding the morality of human actions in relation to creating and using AI, and concerns regarding the moral implications of machine behavior."
response_bad = "Who cares about AI Ethics? It's just a bunch of whining about humans making and using AI and bitching about what the machines do."

# Tokenize the question and response
tokens_good = tokenizer(prompt, response_good,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_bad = tokenizer(prompt, response_bad,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_good.to(device)
tokens_bad.to(device)

score_good = rewardModel(**tokens_good)[0].item()
score_bad = rewardModel(**tokens_bad)[0].item()

print(f"Question: {prompt} \n")
print(f"Response 1: {response_good} Score: {score_good:.3f}")
print(f"Response 2: {response_bad} Score: {score_bad:.3f}")

This will output the following:

>>> Question: Why is AI Ethics important?

>>>Response 1: The field of AI Ethics delves deeply into the intricate ethical considerations that arise with respect to AI systems. This includes the role of humanity in creating and deploying these systems, as well as the conduct of machines themselves. Broadly speaking, AI Ethics can be divided into two major categories : concerns surrounding the morality of human actions in relation to creating and using AI, and concerns regarding the moral implications of machine behavior. Score: 4.777
>>>Response 2: Who cares about AI Ethics? It's just a bunch of whining about humans making and using AI and bitching about what the machines do. Score: -11.582

Performance

Acc	WebGPT
Aira-RewardModel	96.54%*

*Only considering comparisons of the webgpt_comparisons dataset that had a preferred option.

License

The RewardModel is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.