SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Paper • 2410.05255 • Published Oct 7 • 4
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30 • 7