chargoddard
/

piano-medley-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chargoddard commited on Dec 10, 2023

Commit

13928b4

•

1 Parent(s): 38da429

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ datasets:
 - lemonilia/LimaRP
 - PKU-Alignment/PKU-SafeRLHF
 - Intel/orca_dpo_pairs
-- argilla/ultrafeedback-binarized-preferences
 ---
 Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
@@ -18,7 +18,7 @@ Steps taken to produce this model:
 * Train loyal-piano-m7
 * cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
 * Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
-* cDPO servile-harpsichord with argilla/ultrafeedback-binarized-preferences, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
 * TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
 Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!

 - lemonilia/LimaRP
 - PKU-Alignment/PKU-SafeRLHF
 - Intel/orca_dpo_pairs
+- allenai/ultrafeedback_binarized_cleaned
 ---
 Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
 * Train loyal-piano-m7
 * cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
 * Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
+* cDPO servile-harpsichord with allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
 * TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
 Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!