sileod
/

deberta-v3-large-tasksource-rlhf-reward-model

Text Classification

Inference Endpoints

Model card Files Files and versions Community

deberta-v3-large-tasksource-rlhf-reward-model / README.md

sileod's picture

Update README.md

2787455 almost 2 years ago

|

history blame contribute delete

917 Bytes

	---
	datasets:
	- Anthropic/hh-rlhf
	language:
	- en
	tags:
	- rlhf
	model-index:
	- name: deberta-v3-large-tasksource-rlhf-reward-model
	results:
	- task:
	type: text-classification
	name: RLHF
	dataset:
	type: rlhf
	name: Anthropic/hh-rlhf
	split: validation
	metrics:
	- type: accuracy
	value: 0,7516
	verified: true
	---
	# Reward model based [`deberta-v3-large-tasksource-nli`](https://huggingface.co/sileod/deberta-v3-large-tasksource-nli) fine-tuned on Anthropic/hh-rlhf
	For 1 epoch with 1e-5 learning rate.

	The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862).

	Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for `OpenAssistant/reward-model-deberta-v3-large-v2`).