VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 24
Improving Context-Aware Preference Modeling for Language Models Paper • 2407.14916 • Published Jul 20, 2024 • 4