Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published 16 days ago • 14
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 77
Adaptive Decoding via Latent Preference Optimization Paper • 2411.09661 • Published Nov 14, 2024 • 10
Thinking LLMs: General Instruction Following with Thought Generation Paper • 2410.10630 • Published Oct 14, 2024 • 18
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28, 2024 • 20
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28, 2024 • 20 • 2
Improving Open Language Models by Learning from Organic Interactions Paper • 2306.04707 • Published Jun 7, 2023 • 3
Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions Paper • 2304.11063 • Published Apr 18, 2023
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 40
Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss Paper • 2312.16682 • Published Dec 27, 2023 • 5
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21, 2024 • 47
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 46
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 40
Memory-Augmented Reinforcement Learning for Image-Goal Navigation Paper • 2101.05181 • Published Jan 13, 2021