reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Paper • 2503.11751 • Published 3 days ago • 6
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 33
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Paper • 2411.04986 • Published Nov 7, 2024 • 6
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Paper • 2404.12318 • Published Apr 18, 2024 • 15