Budget forcing?

#50

by mwettach - opened 30 days ago

30 days ago

I see a lot of "wait..." in the output. Even after the correct solution to a relatively simple problem has been presented and checked twice, the output still goes on with "wait..." and further considerations. Possibly the authors have applied budget forcing principles from the Standford s1 model (https://github.com/simplescaling/s1, https://arxiv.org/abs/2501.19393), but did not yet find the ideal spot when to refrain from further "wait..." tokens and end the answer.

MrDevolver

15 days ago

Imho, the lowest number of test methods is 3. 1 is not enough, 2 can give uncertain results (50:50), so the third to a pair will be decisive point. Unfortunately sometimes the model has to take into account more aspects than just what can be simply verified by various test procedures such as math methods and then you kinda want it to use more complex and deeper thinking. I think it'd be best to find a method that would help the model in deciding when to think deeper and when to just go with the simpler assumption.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment