Alex Havrilla

Dahoas

AI & ML interests

NLP, RL

Organizations

CarperAI's profile picture DuckAI's profile picture Critiquers's profile picture An optimal synthetic data sampling strategy for MATH's profile picture

Articles 1

Article
225

Illustrating Reinforcement Learning from Human Feedback (RLHF)