--- library_name: transformers tags: - reward-model - RLHF datasets: - PKU-Alignment/PKU-SafeRLHF-30K language: - en base_model: - Nagi-ovo/llama-3-8b-dpo-full pipeline_tag: text-classification --- ## Overview This reward model is trained to predict human preferences between pairs of responses to various prompts. It is designed to be used as part of a Reinforcement Learning from Human Feedback (RLHF) pipeline. ## Model Architecture - Base Model: Llama3-8B with SFT & DPO - Output: Single scalar reward value - Parameters: 8B - Training Framework: DeepSpeed + TRL ## Example Usage ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch device = 'cuda:0' model_name = "Nagi-ovo/Llama-3-8B-RM" model = AutoModelForSequenceClassification.from_pretrained( model_name, load_in_4bit=True, bnb_4bit_quant_type="nf4", ) tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) SYSTEM_PROMPT = "You are a helpful assistant" def format_prompt_answer(prompt, answer): """Format the input for reward model evaluation""" return f"###System: {SYSTEM_PROMPT}\n###Question: {prompt}\n###Answer: {answer}<|end_of_text|>" def get_reward_score(prompt, answer): """Get reward score for a given prompt-answer pair""" formatted_input = format_prompt_answer(prompt, answer) inputs = tokenizer(formatted_input, return_tensors='pt').to(device) with torch.no_grad(): output = model(inputs['input_ids']).logits return output.item() prompt = "How are you?" answer = "I'm doing great! Thank you for asking. How can I help you today?" score = get_reward_score(prompt, answer) print(f"Prompt: {prompt}") print(f"Answer: {answer}") print(f"Reward Score: {score}") ```