Model Card for Llama-3-8B-Instruct-Iterative-SamPO

This repository provides a fine-tuned version of Llama-3-8B-Instruct, using our proposed SamPO algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence. We obey all licenses mentioned in llama3's work.

Performance

Model GSM8K IFEval PiQA MMLU TruthfulQA AlpacaEval2 LC AlpacaEval2 Length in Tokens
Llama3-8B-Instruct 75.06 49.40 80.69 63.85 36.47 22.57 22.92 421
Llama3-8B-Instruct-DPO 75.59 51.80 81.94 64.06 40.39 23.34 23.20 422
Llama3-8B-Instruct-Iterative-DPO 74.91 52.52 81.66 64.02 39.90 23.92 25.50 403
Llama3-8B-Instruct-Iterative-SamPO 77.81 60.55 81.18 64.12 44.07 30.68 35.14 377

Evaluation Details

Five conditional benchmarks, using lm-evaluation-harness:

  • GSM8K: 8-shot, report strict match
  • IFEval: 3-shot, report instruction-level strict accuracy
  • PiQA: 3-shot, report accuracy
  • MMLU: 0-shot, report normalized accuracy
  • TruthfulQA: 3-shot, report accuracy of single-true mc1 setting

One open-ended benchmark, using official alpaca_eval:

  • AlpacaEval2: win rate (%) judged by GPT-4-turbo between the model's outputs vs. the GPT-4-turbo's response
  • LC AlpacaEval2: length-debiased win rate (%) of AlpacaEval2
  • Length in Tokens: the average output length of AlpacaEval2, calculated in tokens with Llama3's tokenizer

Input Format

The model is trained to use the following format:

<|start_header_id|>user<|end_header_id|>

{PROMPT}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

{Response}

Training hyperparameters

The following hyperparameters were used during DPO/SamPO training:

  • DPO beta: 0.1
  • learning_rate: 4e-7
  • total_train_batch_size: 128
  • optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • Weight Decay: 0.0
  • num_epochs: 3.0
  • Specifically add above input format over training samples
Downloads last month
21
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Junrulu/Llama-3-8B-Instruct-Iterative-SamPO

Finetuned
(488)
this model

Dataset used to train Junrulu/Llama-3-8B-Instruct-Iterative-SamPO

Collection including Junrulu/Llama-3-8B-Instruct-Iterative-SamPO