|
--- |
|
datasets: |
|
- Anthropic/hh-rlhf |
|
language: |
|
- en |
|
tags: |
|
- rlhf |
|
model-index: |
|
- name: deberta-v3-large-tasksource-rlhf-reward-model |
|
results: |
|
- task: |
|
type: text-classification |
|
name: RLHF |
|
dataset: |
|
type: rlhf |
|
name: Anthropic/hh-rlhf |
|
split: validation |
|
metrics: |
|
- type: accuracy |
|
value: 0,7516 |
|
verified: true |
|
--- |
|
# Reward model based [`deberta-v3-large-tasksource-nli`](https://huggingface.co/sileod/deberta-v3-large-tasksource-nli) fine-tuned on Anthropic/hh-rlhf |
|
For 1 epoch with 1e-5 learning rate. |
|
|
|
The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862). |
|
|
|
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for `OpenAssistant/reward-model-deberta-v3-large-v2`). |