Safetensors
English
olmo2

Stage clarifications

#6
by Neelectric - opened

Hey folks! Super cool model, really excited to play with it and its SFT/DPO/RLVR variants.

I've been exploring the revisions of this base model and found myself getting a bit confused by the output of list_repo_refs("allenai/OLMo-2-1124-7B").branches, specifically because some include the 'stageX' prefix and some don't. If you could clear up these three questions for me, I'd really appreciate that.

  1. It seems to me that those which have no prefix, such as:

step150-tokens1B
step600-tokens3B
step700-tokens3B
step850-tokens4B
step900-tokens4B
step2000-tokens9B
step2150-tokens10B

might possibly be missing the stage1 prefix, ie. they should instead be

stage1-step150-tokens1B
stage1-step600-tokens3B
stage1-step700-tokens3B
stage1-step850-tokens4B
stage1-step900-tokens4B
stage1-step2000-tokens9B
stage1-step2150-tokens10B

Is that correct/could you make this change? Or am I just misunderstanding the logic.

  1. Could you clear up in what intervals checkpoints are released? Most steps clearly occur in 1,000 step intervals, though [150, 600, 700, 850, 900, 2150] seem to violate this rule, and [100000, 174000, 317000, 536000, 605000, 632000, 672000] seem to be missing (at least judging with some very dirty copilot scripting, apologies if I'm missing something)

  2. Do I understand correctly that all 3 variants for stage2 start off from the same checkpoint (that corresponds with the end of stage 1), then diverge into three separate runs with ingredient1/ingredient2/ingredient3 respectively, and are then finally merged with model souping?

Thank you again to everybody at allenAI for all that you do!

Hey @Neelectric ,

  1. Missing stage1 Prefix:
    You’re correct—the checkpoints such as step150-tokens1B, step600-tokens3B, etc., should indeed have the stage1 prefix. We are in the process of updating these references to include the prefix, so they will soon be listed as:
    stage1-step150-tokens1B
    stage1-step600-tokens3B
    • …and so on.
  2. Intervals for Checkpoints: Checkpoints are generally released at key milestones or after certain amounts of training progress, but there isn’t a strict rule requiring 1,000-step intervals.
  3. Yes, your understanding is correct. Stage 2 begins with a shared checkpoint from the end of Stage 1. From there, three separate runs diverge, each trained with different ingredients (ingredient1, ingredient2, ingredient3). These runs are then combined using model souping techniques to produce the final model.

Awesome, thank you @amanrangapur for replying so quickly and answering all my questions! Really appreciate it!

Neelectric changed discussion status to closed

Sign up or log in to comment