Update README.md
Browse files
README.md
CHANGED
@@ -21,16 +21,21 @@ datasets:
|
|
21 |
The use of this model is governed by the [Llama 2 Community License Agreement](https://ai.meta.com/llama/license/)
|
22 |
|
23 |
## Description:
|
24 |
-
Llama2-13B-SteerLM-RM is a 13 billion parameter language model used as the Reward Model/Attribute Prediction Model in training [Llama2-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama2-70B-SteerLM-Chat)
|
25 |
-
It takes input with context length up to 4,096 tokens.
|
26 |
|
27 |
Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.
|
28 |
|
29 |
-
1. **
|
30 |
-
2. **
|
31 |
-
3. **
|
32 |
-
4. **
|
33 |
-
5. **
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
|
36 |
HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](http://arxiv.org/abs/2311.09528)
|
|
|
21 |
The use of this model is governed by the [Llama 2 Community License Agreement](https://ai.meta.com/llama/license/)
|
22 |
|
23 |
## Description:
|
24 |
+
Llama2-13B-SteerLM-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model/Attribute Prediction Model in training [Llama2-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama2-70B-SteerLM-Chat)
|
|
|
25 |
|
26 |
Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.
|
27 |
|
28 |
+
1. **Quality**: Perceived goodness of response
|
29 |
+
2. **Toxicity**: Undesirable elements such as vulgar, harmful or potentially biased responses
|
30 |
+
3. **Humor**: Sense of humor within responses
|
31 |
+
4. **Creativity**: Willingness to generate non-conventional responses
|
32 |
+
5. **Helpfulness**: Overall helpfulness of the response to the prompt.
|
33 |
+
6. **Correctness**: Inclusion of all pertinent facts without errors.
|
34 |
+
7. **Coherence**: Consistency and clarity of expression.
|
35 |
+
8. **Complexity**: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
|
36 |
+
9. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.
|
37 |
+
|
38 |
+
The first four attrubutes are taken from the [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset while the others are taken from [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) dataset
|
39 |
|
40 |
|
41 |
HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](http://arxiv.org/abs/2311.09528)
|