chargoddard's picture
Update README.md
13cdf6b
|
raw
history blame
551 Bytes
metadata
license: cc-by-nc-4.0
datasets:
  - pankajmathur/orca_mini_v1_dataset
  - openai/summarize_from_feedback
  - PygmalionAI/PIPPA
  - chargoddard/rpguild
  - lemonilia/LimaRP
  - PKU-Alignment/PKU-SafeRLHF
  - Intel/orca_dpo_pairs
  - argilla/ultrafeedback-binarized-preferences

Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.