view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 175
Reasoning Datasets Collection Distilled synthetic Reasoning datasets • 7 items • Updated 25 days ago • 55