Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
qq8933 
posted an update 16 days ago
Post
2528
LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.

Looking forward to it

·

not perfect, but just works:)

.

In this post