REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 90
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Paper • 2308.07926 • Published Aug 15, 2023 • 28
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Paper • 2308.01320 • Published Aug 2, 2023 • 45
Challenges and Applications of Large Language Models Paper • 2307.10169 • Published Jul 19, 2023 • 48
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 244
Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 29