Commit
·
01e9e9f
1
Parent(s):
0acfb00
Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ tags:
|
|
15 |
---
|
16 |
# RewardModel (Portuguese-BR)
|
17 |
|
18 |
-
The `RewardModel` is a modified BERT model that can be used to score the quality of completion to a given prompt. It is based on
|
19 |
|
20 |
The `RewardModel` allows the specification of an $\alpha$ parameter, which is a multiplier to the reward score. This multiplier is set to 1 during training (since our reward values are bounded between -1 and 1) but can be changed at inference to allow for rewards with higher bounds.
|
21 |
|
|
|
15 |
---
|
16 |
# RewardModel (Portuguese-BR)
|
17 |
|
18 |
+
The `RewardModel` is a modified BERT model that can be used to score the quality of completion to a given prompt. It is based on a [BERT model](https://huggingface.co/bert-base-cased), modified to act as a regression model.
|
19 |
|
20 |
The `RewardModel` allows the specification of an $\alpha$ parameter, which is a multiplier to the reward score. This multiplier is set to 1 during training (since our reward values are bounded between -1 and 1) but can be changed at inference to allow for rewards with higher bounds.
|
21 |
|