Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
Spurious Feature Diversification Improves Out-of-distribution Generalization Paper • 2309.17230 • Published Sep 29, 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint Paper • 2312.11456 • Published Dec 18, 2023 • 1
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint Paper • 2312.11456 • Published Dec 18, 2023 • 1
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models Paper • 2306.12420 • Published Jun 21, 2023 • 2
Weakly Supervised Disentangled Generative Causal Representation Learning Paper • 2010.02637 • Published Oct 6, 2020