File size: 1,777 Bytes
6786570 7d025bd 6786570 7d025bd 26e47a3 7d025bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
library_name: transformers
tags:
- reward-model
- RLHF
datasets:
- PKU-Alignment/PKU-SafeRLHF-30K
language:
- en
base_model:
- Nagi-ovo/llama-3-8b-dpo-full
pipeline_tag: text-classification
---
## Overview
This reward model is trained to predict human preferences between pairs of responses to various prompts. It is designed to be used as part of a Reinforcement Learning from Human Feedback (RLHF) pipeline.
## Model Architecture
- Base Model: Llama3-8B with SFT & DPO
- Output: Single scalar reward value
- Parameters: 8B
- Training Framework: DeepSpeed + TRL
## Example Usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
device = 'cuda:0'
model_name = "Nagi-ovo/Llama-3-8B-RM"
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
SYSTEM_PROMPT = "You are a helpful assistant"
def format_prompt_answer(prompt, answer):
"""Format the input for reward model evaluation"""
return f"###System: {SYSTEM_PROMPT}\n###Question: {prompt}\n###Answer: {answer}<|end_of_text|>"
def get_reward_score(prompt, answer):
"""Get reward score for a given prompt-answer pair"""
formatted_input = format_prompt_answer(prompt, answer)
inputs = tokenizer(formatted_input, return_tensors='pt').to(device)
with torch.no_grad():
output = model(inputs['input_ids']).logits
return output.item()
prompt = "How are you?"
answer = "I'm doing great! Thank you for asking. How can I help you today?"
score = get_reward_score(prompt, answer)
print(f"Prompt: {prompt}")
print(f"Answer: {answer}")
print(f"Reward Score: {score}")
``` |