Budget forcing?

#50
by mwettach - opened

I see a lot of "wait..." in the output. Even after the correct solution to a relatively simple problem has been presented and checked twice, the output still goes on with "wait..." and further considerations. Possibly the authors have applied budget forcing principles from the Standford s1 model (https://github.com/simplescaling/s1, https://arxiv.org/abs/2501.19393), but did not yet find the ideal spot when to refrain from further "wait..." tokens and end the answer.

Imho, the lowest number of test methods is 3. 1 is not enough, 2 can give uncertain results (50:50), so the third to a pair will be decisive point. Unfortunately sometimes the model has to take into account more aspects than just what can be simply verified by various test procedures such as math methods and then you kinda want it to use more complex and deeper thinking. I think it'd be best to find a method that would help the model in deciding when to think deeper and when to just go with the simpler assumption.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment