@chansung on Hugging Face: "simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

chansung

posted an update 13 days ago

Post

3364

simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

takarajordan

13 days ago

Very cool

chansung

13 days ago

Thanks!

smirki

13 days ago

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

chansung

13 days ago

Because more tokens has to be stored in vram?

xinnn63

12 days ago

Cool frens!

In this post

chansung chansung park
takarajordan Jordan Legg
smirki Manav
xinnn63 Natalie H