You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Reward model for Plan2Align, using for test-time translation task on zh->en, zh->de, zh->ru language pairs.

@article{wang2025plan2align,
  title={Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation},
  author={Wang, Kuang-Da and Chen, Teng-Ruei and Hung, Yu Heng and Ding, Shuoyang and Wu, Yueh-Hua and Wang, Yu-Chiang Frank and Yang, Chao-Han Huck and Peng, Wen-Chih and Hsieh, Ping-Chun},
  journal={arXiv preprint arXiv:2502.20795},
  year={2025}
}

Using Reward Model

RM = AutoModelForCausalLMWithValueHead.from_pretrained('ray24724919/plan2align_rm',torch_dtype=torch_dtype)
RM.eval()
RM.gradient_checkpointing_enable() #if need
        
value_head_weights = load_file("path-to-valuehead-safetensors")
new_state_dict = {key.replace("v_head.", "") if key.startswith("v_head.") else key: value for key, value in value_head_weights.items()}
RM.v_head.load_state_dict(new_state_dict)

System prompt of translation reward modeling

messages = [{"role": "system", "content": "You are a helpful translator and only output the result."},
            {"role": "user", "content": f"### Translate this from Chinese to {language}, Chinese:\n{source}\n### {language}:"},
            {"role": "assistant", "content": translation}]
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ray24724919/plan2align_rm

Finetuned
(923)
this model