README.md · chargoddard/servile-harpsichord-cdpo at 13cdf6bd90df46f4fae1d31b9d3b4f7fc31a7777

metadata

license: cc-by-nc-4.0
datasets:
  - pankajmathur/orca_mini_v1_dataset
  - openai/summarize_from_feedback
  - PygmalionAI/PIPPA
  - chargoddard/rpguild
  - lemonilia/LimaRP
  - PKU-Alignment/PKU-SafeRLHF
  - Intel/orca_dpo_pairs
  - argilla/ultrafeedback-binarized-preferences

Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.