Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
chansung 
posted an update 13 days ago
Post
3364
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

Very cool

·

Thanks!

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

·

Because more tokens has to be stored in vram?

Cool frens!