File size: 5,663 Bytes

0acfb00
 
 
 
 
 
 
 
 
 
 
 
 
8399b1e
57d1e67
e59dd6d
 
 
 
 
0acfb00
69ade40
9236b96
784fef8
9236b96
784fef8
9236b96
999e7d9
9236b96
2d3efeb
 
e584ad0
5b1965b
ac56161
c84cc3a
784fef8
 
 
c84cc3a
784fef8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2d3efeb
6329d88
981b0ee
9236b96
 
74719a6
9236b96
 
784fef8
9236b96
 
 
 
784fef8
 
9236b96
 
d5be6bf
9236b96
 
784fef8
 
 
9236b96
 
784fef8
 
 
9236b96
784fef8
9236b96
 
784fef8
 
 
9236b96
784fef8
9236b96
 
a5a97d2
 
 
784fef8
 
9236b96
784fef8
 
 
9236b96
 
 
 
 
784fef8
9236b96
784fef8
 
9236b96
 
0acfb00
 
1c00037
 
98c3a6d
 
999e7d9
0acfb00
9236b96
 
784fef8

---
license: apache-2.0
datasets:
- nicholasKluge/reward-aira-dataset
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- reward model
- alignment
- preference model
- RLHF
widget:
  - text: "Why is AI Ethics important? [SEP] Who cares about AI Ethics? It's just a bunch of whining about humans making and using AI and bitching about what the machines do."
    example_title: "Bad Response"
  - text: "Why is AI Ethics important? [SEP] The field of AI Ethics delves deeply into the intricate ethical considerations that arise with respect to AI systems. This includes the role of humanity in creating and deploying these systems, as well as the conduct of machines themselves. Broadly speaking, AI Ethics can be divided into two major categories : concerns surrounding the morality of human actions in relation to creating and using AI, and concerns regarding the moral implications of machine behavior."
    example_title: "Good Response"
---
# RewardModel

The `RewardModel` is a [BERT](https://huggingface.co/bert-base-cased)model that can be used to score the quality of a completion for a given prompt.

The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.

These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.

## Details

- **Size:** 109,038,209 parameters
- **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
- **Language:** English
- **Number of Epochs:** 5
- **Batch size:** 42
- **Optimizer:** `torch.optim.AdamW`
- **Learning Rate:** 5e-5
- **GPU:** 1 NVIDIA A100-SXM4-40GB
- **Emissions:** 0.17 KgCO2
- **Total Energy Consumption:** 0.48 kWh

| Step|Training Loss|Validation Loss|Accuracy|
|---|---|---|---|
| 200 |0.080300|0.037106|0.987499|
| 400 |0.039300|0.036421|0.988433|
| 600 |0.037200|0.041799|0.986447|
| 800 |0.011400|0.039411|0.989602|
| 1000 |0.013800|0.039781|0.989718|
| 1200 |0.012700|0.034337|0.990887|
| 1400 |0.005200|0.037403|0.991120|
| 1600 |0.001800|0.047661|0.990653|
| 1800 |0.000900|0.051354|0.991237|
| 2000 |0.001000|0.046224|0.990419|
| 2200 |0.000200|0.046582|0.991120|
| 2400 |0.000600|0.046632|0.990536|
| 2600 |0.000100|0.051437|0.990770|
| 2800 |0.000500|0.049085|0.990887|
| 3000 |0.000400|0.049938|0.991004|

This repository has the notebook used to train this model.

## Usage

Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/RewardModel")
rewardModel = AutoModelForSequenceClassification.from_pretrained("nicholasKluge/RewardModel")

rewardModel.eval()
rewardModel.to(device)

# Define the question and response
prompt = "Why is AI Ethics important?"
response_good = "The field of AI Ethics delves deeply into the intricate ethical considerations that arise with respect to AI systems. This includes the role of humanity in creating and deploying these systems, as well as the conduct of machines themselves. Broadly speaking, AI Ethics can be divided into two major categories : concerns surrounding the morality of human actions in relation to creating and using AI, and concerns regarding the moral implications of machine behavior."
response_bad = "Who cares about AI Ethics? It's just a bunch of whining about humans making and using AI and bitching about what the machines do."

# Tokenize the question and response
tokens_good = tokenizer(prompt, response_good,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_bad = tokenizer(prompt, response_bad,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_good.to(device)
tokens_bad.to(device)

score_good = rewardModel(**tokens_good)[0].item()
score_bad = rewardModel(**tokens_bad)[0].item()

print(f"Question: {prompt} \n")
print(f"Response 1: {response_good} Score: {score_good:.3f}")
print(f"Response 2: {response_bad} Score: {score_bad:.3f}")
```

This will output the following:

```markdown
>>> Question: Why is AI Ethics important?

>>>Response 1: The field of AI Ethics delves deeply into the intricate ethical considerations that arise with respect to AI systems. This includes the role of humanity in creating and deploying these systems, as well as the conduct of machines themselves. Broadly speaking, AI Ethics can be divided into two major categories : concerns surrounding the morality of human actions in relation to creating and using AI, and concerns regarding the moral implications of machine behavior. Score: 4.777
>>>Response 2: Who cares about AI Ethics? It's just a bunch of whining about humans making and using AI and bitching about what the machines do. Score: -11.582
```

## Performance

| Acc  | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons)  |
|---|---|
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel)  | 96.54%*  |

* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.

## License

The `RewardModel` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.