Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models
wangchenglong
wangclnlp
AI & ML interests
None yet
Organizations
GRAM-RR
Self-Training Generative Foundation Reward Models for Reward Reasoning
-
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning
Paper • 2509.02492 • Published • 1 -
wangclnlp/GRAM-RR-LLaMA-3.1-8B-RewardModel
Text Generation • 8B • Updated • 6 • 2 -
wangclnlp/GRAM-RR-LLaMA-3.2-3B-RewardModel
Text Generation • 3B • Updated • 25 -
wangclnlp/GRAM-RR-TrainingData
Updated • 2
Probing-RM
Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models
GRAM-RR
Self-Training Generative Foundation Reward Models for Reward Reasoning
-
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning
Paper • 2509.02492 • Published • 1 -
wangclnlp/GRAM-RR-LLaMA-3.1-8B-RewardModel
Text Generation • 8B • Updated • 6 • 2 -
wangclnlp/GRAM-RR-LLaMA-3.2-3B-RewardModel
Text Generation • 3B • Updated • 25 -
wangclnlp/GRAM-RR-TrainingData
Updated • 2