details pls

by archit11 - opened Feb 1

Feb 1

can you give some details on what data it was trained on and for how many steps , i tried it to do grpo with smollm 350m on gsm8k but it was really bad so i stopped after few steps

ubermenchh

Owner Feb 1

This was just a dry random run, don't expect any thing from this. It was trained on 'trl-lib/ultrafeedback_binarized'.
I am planning to experiment with GRPO and SmolLM this week so lets see how that goes.

ubermenchh changed discussion status to closed Feb 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment