Difference between this and the other (100 steps) model?

#1
by lemon07r - opened

Im curious what the difference is between this model and the other one, only difference I see is in the name, the "100 steps".

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

After, refer to trainer_state.json

Which one we should use?

Which one we should use?

AALF/gemma-2-27b-it-SimPO-37K-100steps is better.

I tried both. 100 steps is MUCH better.. Meanwhile this, like and download is higher๐Ÿ˜…

@imoc somehow both has terrible score on open-llm-leaderboard

Sign up or log in to comment