LLM-as-Judge RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published 5 days ago • 10
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published 5 days ago • 10
RLHF MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Paper • 2410.02743 • Published 8 days ago • 5 Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published 3 days ago • 13
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Paper • 2410.02743 • Published 8 days ago • 5
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published 3 days ago • 13