@qq8933 on Hugging Face: "LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend. We have…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

qq8933

posted an update 16 days ago

Post

2528

LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.

AlexLINB

16 days ago

Looking forward to it

qq8933

16 days ago

not perfect, but just works:)

Teera

16 days ago

In this post

qq8933 Di Zhang
AlexLINB AlexLI
Teera Narak A'