Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ model-index:
|
|
20 |
value: 0,7516
|
21 |
verified: true
|
22 |
---
|
23 |
-
# Reward model based `deberta-v3-large-tasksource-nli` fine-tuned on Anthropic/hh-rlhf
|
24 |
For 1 epoch with 1e-5 learning rate.
|
25 |
|
26 |
The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862).
|
|
|
20 |
value: 0,7516
|
21 |
verified: true
|
22 |
---
|
23 |
+
# Reward model based [`deberta-v3-large-tasksource-nli`](https://huggingface.co/sileod/deberta-v3-large-tasksource-nli) fine-tuned on Anthropic/hh-rlhf
|
24 |
For 1 epoch with 1e-5 learning rate.
|
25 |
|
26 |
The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862).
|