Nagi-ovo
/

Llama-3-8B-RM

Text Classification

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-8B-RM / README.md

Nagi-ovo's picture

Update README.md

26e47a3 verified 6 days ago

|

1.78 kB

	---
	library_name: transformers
	tags:
	- reward-model
	- RLHF
	datasets:
	- PKU-Alignment/PKU-SafeRLHF-30K
	language:
	- en
	base_model:
	- Nagi-ovo/llama-3-8b-dpo-full
	pipeline_tag: text-classification
	---

	## Overview
	This reward model is trained to predict human preferences between pairs of responses to various prompts. It is designed to be used as part of a Reinforcement Learning from Human Feedback (RLHF) pipeline.

	## Model Architecture
	- Base Model: Llama3-8B with SFT & DPO
	- Output: Single scalar reward value
	- Parameters: 8B
	- Training Framework: DeepSpeed + TRL

	## Example Usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	device = 'cuda:0'
	model_name = "Nagi-ovo/Llama-3-8B-RM"

	model = AutoModelForSequenceClassification.from_pretrained(
	model_name,
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	)

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

	SYSTEM_PROMPT = "You are a helpful assistant"

	def format_prompt_answer(prompt, answer):
	"""Format the input for reward model evaluation"""
	return f"###System: {SYSTEM_PROMPT}\n###Question: {prompt}\n###Answer: {answer}<\|end_of_text\|>"

	def get_reward_score(prompt, answer):
	"""Get reward score for a given prompt-answer pair"""
	formatted_input = format_prompt_answer(prompt, answer)
	inputs = tokenizer(formatted_input, return_tensors='pt').to(device)

	with torch.no_grad():
	output = model(inputs['input_ids']).logits

	return output.item()

	prompt = "How are you?"
	answer = "I'm doing great! Thank you for asking. How can I help you today?"

	score = get_reward_score(prompt, answer)
	print(f"Prompt: {prompt}")
	print(f"Answer: {answer}")
	print(f"Reward Score: {score}")
	```