|
--- |
|
language: |
|
- en |
|
license: llama3 |
|
tags: |
|
- text-classification |
|
datasets: |
|
- openbmb/UltraFeedback |
|
- nvidia/HelpSteer |
|
- Anthropic/hh-rlhf |
|
- PKU-Alignment/PKU-SafeRLHF |
|
- NCSOFT/offsetbias |
|
base_model: |
|
- sfairXC/FsfairX-LLaMA3-RM-v0.1 |
|
- meta-llama/Meta-Llama-3-8B-Instruct |
|
--- |
|
|
|
# Model Card for Llama-3-OffsetBias-RM-8B |
|
|
|
**Llama-3-OffsetBias-RM-8B** is a *reward model* trained on OffsetBias dataset. It is trained to be more robust on various evaluation *biases* commonly found in evaluation models. The model is introduced in paper **OffsetBias: Leveraging Debiased Data for Tuning Evaluators**. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
**Llama-3-OffsetBias-RM-8B** uses [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using a subset of dataset used in training of *FsfairX-LLaMA3-RM* model, combined with *NCSOFT/offsetbias* dataset. The intermediate model is then merged with *FsfairX-LLaMA3-RM* model to create **Llama-3-OffsetBias-RM-8B**. |
|
|
|
- **Developed by:** NC Research |
|
- **Language(s) (NLP):** English |
|
- **License:** llama3 |
|
- **Finetuned from model:** [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) |
|
|
|
### Model Sources |
|
|
|
- π» **Repository:** [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias) |
|
- π **Paper:** [OffsetBias: Leveraging Debiased Data for Tuning Evaluators](https://arxiv.org/abs/2407.06551) |
|
- π€ **Dataset:** [https://huggingface.co/datasets/NCSOFT/offsetbias](https://huggingface.co/datasets/NCSOFT/offsetbias) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
```python |
|
from transformers import AutoTokenizer, pipeline |
|
|
|
model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B" |
|
rm_tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
rm_pipe = pipeline( |
|
"sentiment-analysis", |
|
model=model_name, |
|
device="auto", |
|
tokenizer=rm_tokenizer, |
|
model_kwargs={"torch_dtype": torch.bfloat16} |
|
) |
|
|
|
pipe_kwargs = { |
|
"return_all_scores": True, |
|
"function_to_apply": "none", |
|
"batch_size": 1 |
|
} |
|
|
|
chat = [ |
|
{"role": "user", "content": "Hello, how are you?"}, |
|
{"role": "assistant", "content": "I'm doing great. How can I help you today?"}, |
|
{"role": "user", "content": "I'd like to show off how chat templating works!"}, |
|
] |
|
|
|
test_texts = [tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")] |
|
pipe_outputs = rm_pipe(test_texts, **pipe_kwargs) |
|
rewards = [output[0]["score"] for output in pipe_outputs] |
|
``` |
|
|
|
## Evaluation |
|
|
|
### RewardBench Result |
|
| Metric | Score | |
|
|--------------|--------| |
|
| Chat | 97.21 | |
|
| Chat Hard | 80.70 | |
|
| Safety | 89.01 | |
|
| Reasoning | 90.60 | |
|
|
|
### EvalBiasBench Result |
|
|
|
| Metric | Score | |
|
|-----------------------|-------| |
|
| Length | 82.4 | |
|
| Concreteness | 92.9 | |
|
| Empty Reference | 46.2 | |
|
| Content Continuation | 100.0 | |
|
| Nested Instruction | 83.3 | |
|
| Familiar Knowledge | 58.3 | |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{park2024offsetbias, |
|
title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators}, |
|
author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi}, |
|
year={2024}, |
|
eprint={2407.06551}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|