iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed192-w-score Viewer • Updated 3 days ago • 60.9k • 9
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed42 Viewer • Updated 4 days ago • 60.9k • 10
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed192 Viewer • Updated 4 days ago • 60.9k • 7
Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23, 2024
Cross-lingual Transfer of Reward Models Collection This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment". • 5 items • Updated Oct 31, 2024
iqwiki-kor/uf-g4o_translated-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed8049 Viewer • Updated Oct 30, 2024 • 56.8k • 36
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed6247 Viewer • Updated Oct 29, 2024 • 10.2k • 33