GitBag/a_star_final_a_star_math_1.5_random_reward_actor Text Generation • 2B • Updated 17 days ago • 27
GitBag/a_star_final_a_star_math_1.5_wrong_reward_actor Text Generation • 2B • Updated 17 days ago • 24
GitBag/a_star_final_a_star_math_3_random_reward_actor Text Generation • 3B • Updated 18 days ago • 25
GitBag/a_star_final_a_star_math_7_random_reward_actor Text Generation • 8B • Updated 18 days ago • 62
GitBag/a_star_final_ds-distilled-qwen-1.5b-a-star-16384_actor Text Generation • 2B • Updated May 27 • 9
GitBag/a_star_final_ds-distilled-qwen-1.5b-grpo-2-kl-1e-4-16384_actor Text Generation • 2B • Updated May 27 • 26
GitBag/a_star_final_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-16384_critic Token Classification • 2B • Updated May 12 • 6
GitBag/a_star_final_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-16384_actor Text Generation • 2B • Updated May 12 • 40
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-14336_critic Token Classification • 2B • Updated May 7 • 8
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-14336_actor Text Generation • 2B • Updated May 7 • 11
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-16384_critic Token Classification • 2B • Updated May 7 • 7
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-16384_actor Text Generation • 2B • Updated May 7 • 57
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-good-1 Text Generation • 2B • Updated May 7 • 1.69k
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-run-1 Text Generation • 2B • Updated May 5 • 6
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-grpo-2-kl-1e-4-14336_actor Text Generation • 2B • Updated May 3 • 6
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-grpo-2-kl-1e-4-8192_actor Text Generation • 2B • Updated May 2 • 6
GitBag/lr5e-05-random-latent-with-latent-predictions-global_step_100 Text Generation • 2B • Updated Apr 18 • 10
GitBag/qwen2.5-1.5b-math-sft-bs-256-lr-1e-4-regress_prob-20-zl-no-bpt-global_step_160 Text Generation • 2B • Updated Apr 15 • 8
GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708 Text Generation • 8B • Updated Jan 28 • 46