zhilinw commited on
Commit
751442b
1 Parent(s): 914e843

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ datasets:
18
 
19
  Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses. Specifically, it has been trained using a Llama-3.1-70B-Instruct Base on a novel approach combining the strength of Bradley Terry and SteerLM Regression Reward Modelling.
20
 
21
- Given a conversation with multiple turns between user and assistant (of up to 4,096 tokens), it rates the quality of the final assistant turn using a reward score.
22
 
23
  For the same prompt, a response with higher reward score has higher quality than another response with a lower reward score, but the same cannot be said when comparing the scores between responses to different prompts.
24
 
 
18
 
19
  Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses. Specifically, it has been trained using a Llama-3.1-70B-Instruct Base on a novel approach combining the strength of Bradley Terry and SteerLM Regression Reward Modelling.
20
 
21
+ Given a English conversation with multiple turns between user and assistant (of up to 4,096 tokens), it rates the quality of the final assistant turn using a reward score.
22
 
23
  For the same prompt, a response with higher reward score has higher quality than another response with a lower reward score, but the same cannot be said when comparing the scores between responses to different prompts.
24