---
license: apache-2.0
datasets:
- nicholasKluge/reward-aira-dataset
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- reward model
- alignment
- preference model
---
# RewardModel

The `RewardModel` is a modified BERT model that can be used to score the quality of completion to a given prompt. It is based on a [BERT model](https://huggingface.co/bert-base-cased), modified to act as a regression model.

The `RewardModel` allows the specification of an `alpha` parameter, which is a multiplier to the reward score. This multiplier is set to 1 during training (since our reward values are bounded between -1 and 1) but can be changed at inference to allow for rewards with higher bounds.

The model was trained with a dataset composed of `prompt`, `completions`, and annotated `rewards`.

> Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.

## Details

- **Size:** 109,038,209 parameters
- **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
- **Number of Epochs:** 5
- **Batch size:** 64
- **Optimizer:** `torch.optim.Adam`
- **Learning Rate:** 1e-4
- **Loss Function:** `torch.nn.MSELoss()`
- **GPU:** 1 NVIDIA A100-SXM4-40GB
- **RMSE in testing:** 0.1190


| Epoch/Loss|Training|Validation|
|---|---|---|
| 1 |0.06136|0.031815|
| 2 |0.030398|0.02459|
| 3 |0.02389|0.026569|
| 4 |0.024755|0.02109|
| 5 |0.019445|0.01416|

> Note: This repository has the notebook used to train this model.

## Usage

Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:

```python
from transformers import AutoTokenizer,AutoConfig, AutoModel
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

config = AutoConfig.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, revision='main') 
tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, config=config, revision='main')
rewardModel = AutoModel.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, config=config, revision='main') 

rewardModel.to(device)
rewardModel.eval()


# Define the question and response
question = "What is the capital of France?"
response1 = "Paris, France's capital, is a major European city and a global center for art, fashion, gastronomy and culture."
response2 = "Google it pal."

# Tokenize the question and response
tokens = tokenizer(question, response1,
                return_token_type_ids=False,
                return_tensors="pt", 
                return_attention_mask=True)

tokens.to(device)

# Score the response
score = model(**tokens, alpha=10).item()

print(f"Question: {question} \n")
print(f"Response 1: {response1} Score: {score:.3f}")

tokens = tokenizer(question, response2,
                return_token_type_ids=False,
                return_tensors="pt", 
                return_attention_mask=True)

tokens.to(device)

score = model(**tokens, alpha=10).item()

print(f"Response 2: {response2} Score: {score:.3f}")
```

This will output the following:

```markdown
>>> Question: What is the capital of France? 

>>>Response: Paris, France's capital, is a major European city and a global center for art, fashion, gastronomy and culture. Score: 3.183
>>>Response: Google it pal. Score: -5.781
```

## Performance

| Acc  | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons)  | [SytheticGPT](https://huggingface.co/datasets/openai/summarize_from_feedback)  | 
|---|---|---|
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel)  | 53.54%  | 99.80%  | 

## License

The `RewardModel` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.