where is the example ?

by chuangzhidian - opened Mar 31, 2023

Discussion

chuangzhidian

Mar 31, 2023

where is readme? wanna try it

sileod

Owner Mar 31, 2023

•

edited Mar 31, 2023

Hi, thanks for your interest; it's the same model interface as https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2
The model was trained to predict the best answer among two.
A forward pass of the frozen model provides a loss which can be used a signal for RLHF, there are some codes to do that but I didn't explore them.

chuangzhidian

Apr 3, 2023

thank you ,you're being very helpful. :)
seems to me ,the output is higher than this model 's you mentioned ealier[https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2].
is it true in general , your model is better in terms of discriminating performance?

sileod

Owner Apr 3, 2023

thank you ,you're being very helpful. :)
seems to me ,the output is higher than this model 's you mentioned ealier[https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2].
is it true in general , your model is better in terms of discriminating performance?

The absolute value of the output doesn't really count, it's all about the gradient, which is related to the difference of value for different inputs

chuangzhidian

Apr 3, 2023

so the absolute value of the output doesn't matter that much but the comparison result of the two?Thank you for your prompt attention to this matter

sileod

Owner Apr 3, 2023

•

edited Apr 3, 2023

so the absolute value of the output doesn't matter that much but the comparison result of the two?Thank you for your prompt attention to this matter

Yes. The reward model provides a score of how "good" some generated text is. We can then optimize a generator so that it generate "better" things by using the reward model output as the opposite of the loss
See the RLHF paper by openai

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment