NCSOFT
/

Llama-3-OffsetBias-RM-8B

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-OffsetBias-RM-8B / README.md

shayekh's picture

tokenizer and torch variables fixed

781505c verified 3 months ago

|

3.53 kB

	---
	language:
	- en
	license: llama3
	tags:
	- text-classification
	datasets:
	- openbmb/UltraFeedback
	- nvidia/HelpSteer
	- Anthropic/hh-rlhf
	- PKU-Alignment/PKU-SafeRLHF
	- NCSOFT/offsetbias
	base_model:
	- sfairXC/FsfairX-LLaMA3-RM-v0.1
	- meta-llama/Meta-Llama-3-8B-Instruct
	---

	# Model Card for Llama-3-OffsetBias-RM-8B

	Llama-3-OffsetBias-RM-8B is a reward model trained on OffsetBias dataset. It is trained to be more robust on various evaluation biases commonly found in evaluation models. The model is introduced in paper OffsetBias: Leveraging Debiased Data for Tuning Evaluators.

	## Model Details

	### Model Description

	Llama-3-OffsetBias-RM-8B uses [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using a subset of dataset used in training of FsfairX-LLaMA3-RM model, combined with NCSOFT/offsetbias dataset. The intermediate model is then merged with FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.

	- Developed by: NC Research
	- Language(s) (NLP): English
	- License: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
	- Finetuned from model: [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)

	### Model Sources

	- 💻 Repository: [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias)
	- 📜 Paper: [OffsetBias: Leveraging Debiased Data for Tuning Evaluators](https://arxiv.org/abs/2407.06551)
	- 🤗 Dataset: [https://huggingface.co/datasets/NCSOFT/offsetbias](https://huggingface.co/datasets/NCSOFT/offsetbias)

	## Uses

	### Direct Use

	```python
	from transformers import AutoTokenizer, pipeline
	import torch

	model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
	rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
	rm_pipe = pipeline(
	"sentiment-analysis",
	model=model_name,
	device="auto",
	tokenizer=rm_tokenizer,
	model_kwargs={"torch_dtype": torch.bfloat16}
	)

	pipe_kwargs = {
	"return_all_scores": True,
	"function_to_apply": "none",
	"batch_size": 1
	}

	chat = [
	{"role": "user", "content": "Hello, how are you?"},
	{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
	{"role": "user", "content": "I'd like to show off how chat templating works!"},
	]

	test_texts = [rm_tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(rm_tokenizer.bos_token, "")]
	pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
	rewards = [output[0]["score"] for output in pipe_outputs]
	```

	## Evaluation

	### RewardBench Result
	\| Metric \| Score \|
	\|--------------\|--------\|
	\| Chat \| 97.21 \|
	\| Chat Hard \| 80.70 \|
	\| Safety \| 89.01 \|
	\| Reasoning \| 90.60 \|

	### EvalBiasBench Result

	\| Metric \| Score \|
	\|-----------------------\|-------\|
	\| Length \| 82.4 \|
	\| Concreteness \| 92.9 \|
	\| Empty Reference \| 46.2 \|
	\| Content Continuation \| 100.0 \|
	\| Nested Instruction \| 83.3 \|
	\| Familiar Knowledge \| 58.3 \|

	## Citation

	```bibtex
	@misc{park2024offsetbias,
	title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
	author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
	year={2024},
	eprint={2407.06551},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```