chargoddard commited on
Commit
13928b4
1 Parent(s): 38da429

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@ datasets:
8
  - lemonilia/LimaRP
9
  - PKU-Alignment/PKU-SafeRLHF
10
  - Intel/orca_dpo_pairs
11
- - argilla/ultrafeedback-binarized-preferences
12
  ---
13
 
14
  Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
@@ -18,7 +18,7 @@ Steps taken to produce this model:
18
  * Train loyal-piano-m7
19
  * cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
20
  * Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
21
- * cDPO servile-harpsichord with argilla/ultrafeedback-binarized-preferences, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
22
  * TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
23
 
24
  Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!
 
8
  - lemonilia/LimaRP
9
  - PKU-Alignment/PKU-SafeRLHF
10
  - Intel/orca_dpo_pairs
11
+ - allenai/ultrafeedback_binarized_cleaned
12
  ---
13
 
14
  Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
 
18
  * Train loyal-piano-m7
19
  * cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
20
  * Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
21
+ * cDPO servile-harpsichord with allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
22
  * TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
23
 
24
  Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!