Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
nbd22
/
Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
like
0
Transformers
Safetensors
Generated from Trainer
trl
grpo
Inference Endpoints
arxiv:
2402.03300
Model card
Files
Files and versions
Community
1
Train
Deploy
Use this model
Need idea about reward function.
#1
by
davinders
- opened
Jan 29
Discussion
davinders
Jan 29
This comment has been hidden
davinders
changed discussion status to
closed
Jan 29
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment