chargoddard
commited on
Commit
•
13928b4
1
Parent(s):
38da429
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ datasets:
|
|
8 |
- lemonilia/LimaRP
|
9 |
- PKU-Alignment/PKU-SafeRLHF
|
10 |
- Intel/orca_dpo_pairs
|
11 |
-
-
|
12 |
---
|
13 |
|
14 |
Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
|
@@ -18,7 +18,7 @@ Steps taken to produce this model:
|
|
18 |
* Train loyal-piano-m7
|
19 |
* cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
|
20 |
* Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
|
21 |
-
* cDPO servile-harpsichord with
|
22 |
* TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
|
23 |
|
24 |
Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!
|
|
|
8 |
- lemonilia/LimaRP
|
9 |
- PKU-Alignment/PKU-SafeRLHF
|
10 |
- Intel/orca_dpo_pairs
|
11 |
+
- allenai/ultrafeedback_binarized_cleaned
|
12 |
---
|
13 |
|
14 |
Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
|
|
|
18 |
* Train loyal-piano-m7
|
19 |
* cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
|
20 |
* Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
|
21 |
+
* cDPO servile-harpsichord with allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
|
22 |
* TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
|
23 |
|
24 |
Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!
|