weqweasdas commited on
Commit
70e2526
1 Parent(s): 0656b31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -29,8 +29,9 @@ The model is trained on a mixture of the following datasets. We also provide the
29
  - [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer)
30
  - [Orca](argilla/distilabel-intel-orca-dpo-pairs)
31
 
32
- Difference between this mixture and that of
33
 
 
34
  - SHP: we only use the samples with score ratio > 2, for each prompt, we take 5 comparison at most, leading to 109526;
35
  - Ultrafeedback: similar to UltraFeedback-Binarized, we use the fine-grained score instead of the overall one to rank samples. Meanwhile, for each prompt, we take all possible 6 pairs of comparisons. Finally, we delete the selected pairs with equal scores, leading to 267416.
36
  - HelpSteer: we use the mean of helpfulness and correctness to rank samples. Meanwhile, we take all possible 6 pairs of comparisons. Finally, we delete the selected pairs with equal scores, leading to 21576;
 
29
  - [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer)
30
  - [Orca](argilla/distilabel-intel-orca-dpo-pairs)
31
 
32
+ Difference between this mixture and the original dataset
33
 
34
+ - HH-RLHF: we only use the helpful subset and we delete the noisy samples where chosen_response == rejected_response;
35
  - SHP: we only use the samples with score ratio > 2, for each prompt, we take 5 comparison at most, leading to 109526;
36
  - Ultrafeedback: similar to UltraFeedback-Binarized, we use the fine-grained score instead of the overall one to rank samples. Meanwhile, for each prompt, we take all possible 6 pairs of comparisons. Finally, we delete the selected pairs with equal scores, leading to 267416.
37
  - HelpSteer: we use the mean of helpfulness and correctness to rank samples. Meanwhile, we take all possible 6 pairs of comparisons. Finally, we delete the selected pairs with equal scores, leading to 21576;