parkjunsoo91's picture
initial commit
4343b0f
|
raw
history blame
3.52 kB
metadata
language:
  - en
license: llama3
tags:
  - text-classification
datasets:
  - openbmb/UltraFeedback
  - nvidia/HelpSteer
  - Anthropic/hh-rlhf
  - PKU-Alignment/PKU-SafeRLHF
  - NCSOFT/offsetbias
base_model:
  - sfairXC/FsfairX-LLaMA3-RM-v0.1
  - meta-llama/Meta-Llama-3-8B-Instruct

Model Card for Llama-3-OffsetBias-RM-8B

Llama-3-OffsetBias-RM-8B is a reward model trained on OffsetBias dataset. It is trained to be more robust on various evaluation biases commonly found in evaluation models. The model is introduced in paper OffsetBias: Leveraging Debiased Data for Tuning Evaluators.

Model Details

Model Description

Llama-3-OffsetBias-RM-8B uses sfairXC/FsfairX-LLaMA3-RM-v0.1 as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from Llama-3-8B-Instruct using a subset of dataset used in training of FsfairX-LLaMA3-RM model, combined with NCSOFT/offsetbias dataset. The intermediate model is then merged with FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.

Model Sources

Uses

Direct Use

  from transformers import AutoTokenizer, pipeline

  model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
  rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
  rm_pipe = pipeline(
      "sentiment-analysis",
      model=model_name,
      device="auto",
      tokenizer=rm_tokenizer,
      model_kwargs={"torch_dtype": torch.bfloat16}
  )

  pipe_kwargs = {
      "return_all_scores": True,
      "function_to_apply": "none",
      "batch_size": 1
  }

  chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
  ]

  test_texts = [tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")]
  pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
  rewards = [output[0]["score"] for output in pipe_outputs]

Evaluation

RewardBench Result

Metric Score
Chat 97.21
Chat Hard 80.70
Safety 89.01
Reasoning 90.60

EvalBiasBench Result

Metric Score
Length 82.4
Concreteness 92.9
Empty Reference 46.2
Content Continuation 100.0
Nested Instruction 83.3
Familiar Knowledge 58.3

Citation

@misc{park2024offsetbias,
      title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
      author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
      year={2024},
      eprint={2407.06551},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}