metadata
language:
- en
license: llama3
tags:
- text-classification
datasets:
- openbmb/UltraFeedback
- nvidia/HelpSteer
- Anthropic/hh-rlhf
- PKU-Alignment/PKU-SafeRLHF
- NCSOFT/offsetbias
base_model:
- sfairXC/FsfairX-LLaMA3-RM-v0.1
- meta-llama/Meta-Llama-3-8B-Instruct
Model Card for Llama-3-OffsetBias-RM-8B
Llama-3-OffsetBias-RM-8B is a reward model trained on OffsetBias dataset. It is trained to be more robust on various evaluation biases commonly found in evaluation models. The model is introduced in paper OffsetBias: Leveraging Debiased Data for Tuning Evaluators.
Model Details
Model Description
Llama-3-OffsetBias-RM-8B uses sfairXC/FsfairX-LLaMA3-RM-v0.1 as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from Llama-3-8B-Instruct using a subset of dataset used in training of FsfairX-LLaMA3-RM model, combined with NCSOFT/offsetbias dataset. The intermediate model is then merged with FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.
- Developed by: NC Research
- Language(s) (NLP): English
- License: llama3
- Finetuned from model: sfairXC/FsfairX-LLaMA3-RM-v0.1
Model Sources
- 💻 Repository: https://github.com/ncsoft/offsetbias
- 📜 Paper: OffsetBias: Leveraging Debiased Data for Tuning Evaluators
- 🤗 Dataset: https://huggingface.co/datasets/NCSOFT/offsetbias
Uses
Direct Use
from transformers import AutoTokenizer, pipeline
model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
rm_pipe = pipeline(
"sentiment-analysis",
model=model_name,
device="auto",
tokenizer=rm_tokenizer,
model_kwargs={"torch_dtype": torch.bfloat16}
)
pipe_kwargs = {
"return_all_scores": True,
"function_to_apply": "none",
"batch_size": 1
}
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
test_texts = [tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")]
pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
rewards = [output[0]["score"] for output in pipe_outputs]
Evaluation
RewardBench Result
Metric | Score |
---|---|
Chat | 97.21 |
Chat Hard | 80.70 |
Safety | 89.01 |
Reasoning | 90.60 |
EvalBiasBench Result
Metric | Score |
---|---|
Length | 82.4 |
Concreteness | 92.9 |
Empty Reference | 46.2 |
Content Continuation | 100.0 |
Nested Instruction | 83.3 |
Familiar Knowledge | 58.3 |
Citation
@misc{park2024offsetbias,
title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
year={2024},
eprint={2407.06551},
archivePrefix={arXiv},
primaryClass={cs.CL}
}