Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets Paper • 2410.01779 • Published Oct 2 • 2
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 16 days ago • 62
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published about 1 month ago • 15