Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
hba123 
posted an update about 17 hours ago
Post
498
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.


Check it out: https://huggingface.co/blog/hba123/derivingdpo
In this post