🐦 MagpieLM-8B-Chat-v0.1

🧐 About This Model

Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1

This model is an aligned version of meta-llama/Meta-Llama-3.1-8B, which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.

We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.

We first perform SFT using Magpie-Align/MagpieLM-SFT-Data-v0.1.

SFT Model Checkpoint: Magpie-Align/MagpieLM-8B-SFT-v0.1

We then perform DPO on the Magpie-Align/MagpieLM-DPO-Data-v0.1 dataset.

🔥 Benchmark Performance

Greedy Decoding

Alpaca Eval 2: 58.18 (LC), 62.38 (WR)
Arena Hard: 48.4
WildBench WB Score (v2.0625): 44.72

Benchmark Performance Compare to Other SOTA SLMs

👀 Other Information

License: Please follow Meta Llama 3.1 Community License.

Conversation Template: Please use the Llama 3 chat template for the best performance.

Limitations: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.

🧐 How to use it?

Please update transformers to the latest version by pip install git+https://github.com/huggingface/transformers.

You can then run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

import transformers
import torch

model_id = "MagpieLM-8B-Chat-v0.1"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Alignment Pipeline

The detailed alignment pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT. Please refer to the model card of SFT checkpoint and below for detailed configurations.

See axolotl config

axolotl version: 0.4.1

base_model: meta-llama/Meta-Llama-3.1-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
chat_template: llama3

load_in_8bit: false
load_in_4bit: false
strict: false
main_process_port: 0

datasets:
  - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
    type: sharegpt
    conversation: llama3

dataset_prepared_path: last_run_prepared
val_set_size: 0.001
output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1

sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

wandb_project: SynDa
wandb_entity:
wandb_watch:
wandb_name: MagpieLM-8B-SFT-v0.1
wandb_log_model:
hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1

gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 5
eval_table_size:
saves_per_epoch: 
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

Stage 2: Direct Preference Optimization

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.686	0.0653	100	0.6856	-0.0491	-0.0616	0.6480	0.0125	-471.3315	-478.8181	-0.7034	-0.7427
0.6218	0.1306	200	0.6277	-0.6128	-0.7720	0.6960	0.1591	-542.3653	-535.1920	-0.7771	-0.8125
0.5705	0.1959	300	0.5545	-2.4738	-3.0052	0.7270	0.5314	-765.6894	-721.2881	-0.7894	-0.8230
0.4606	0.2612	400	0.5081	-2.6780	-3.3782	0.7560	0.7002	-802.9893	-741.7116	-0.6813	-0.7247
0.4314	0.3266	500	0.4787	-3.6697	-4.6026	0.7630	0.9329	-925.4283	-840.8740	-0.6189	-0.6691
0.449	0.3919	600	0.4533	-3.7414	-4.8019	0.7820	1.0604	-945.3563	-848.0514	-0.6157	-0.6681
0.4538	0.4572	700	0.4350	-4.3858	-5.6549	0.7890	1.2690	-1030.6561	-912.4920	-0.5789	-0.6331
0.35	0.5225	800	0.4186	-4.7129	-6.1662	0.8010	1.4533	-1081.7843	-945.1964	-0.5778	-0.6347
0.4153	0.5878	900	0.4108	-4.9836	-6.5320	0.7970	1.5484	-1118.3677	-972.2631	-0.5895	-0.6474
0.3935	0.6531	1000	0.3999	-4.4303	-5.9370	0.8110	1.5067	-1058.8646	-916.9379	-0.6016	-0.6598
0.3205	0.7184	1100	0.3950	-5.1884	-6.8827	0.8010	1.6943	-1153.4371	-992.7452	-0.5846	-0.6452
0.3612	0.7837	1200	0.3901	-5.0426	-6.7179	0.8040	1.6753	-1136.9619	-978.1701	-0.6046	-0.6637
0.3058	0.8490	1300	0.3877	-5.1224	-6.8428	0.8040	1.7204	-1149.4465	-986.1475	-0.6087	-0.6690
0.3467	0.9144	1400	0.3871	-5.2335	-6.9809	0.8090	1.7474	-1163.2629	-997.2610	-0.6071	-0.6672
0.3197	0.9797	1500	0.3867	-5.1502	-6.8793	0.8080	1.7291	-1153.0979	-988.9237	-0.6120	-0.6722

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1

See alignment handbook configs

# Customized Configs
model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
run_name: MagpieLM-8B-Chat-v0.1

dataset_mixer:
   Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
dataset_splits:
- train
- test
preprocessing_num_workers: 24

# DPOTrainer arguments
bf16: true
beta: 0.01
learning_rate: 2.0e-7
gradient_accumulation_steps: 16
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
num_train_epochs: 1
max_length: 2048
max_prompt_length: 1800
warmup_ratio: 0.1
logging_steps: 1
lr_scheduler_type: cosine
optim: adamw_torch

torch_dtype: null
# use_flash_attention_2: true
do_eval: true
evaluation_strategy: steps
eval_steps: 100
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: False
log_level: info
push_to_hub: true
save_total_limit: 0
seed: 42
report_to:
- wandb

📚 Citation

If you find the model, data, or code useful, please cite:

@article{xu2024magpie,
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

@article{xu2024stronger,
  title={Stronger Models are NOT Stronger Teachers for Instruction Tuning},
  author={Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Lin, Bill Yuchen and Poovendran, Radha},
  journal={arXiv preprint arXiv:2411.07133},
  year={2024}
}

Contact

Questions? Contact:

Zhangchen Xu [zxu9 at uw dot edu], and
Bill Yuchen Lin [yuchenlin1995 at gmail dot com]

Magpie-Align
/

MagpieLM-8B-Chat-v0.1

🐦 MagpieLM-8B-Chat-v0.1

🧐 About This Model

🔥 Benchmark Performance

👀 Other Information

🧐 How to use it?

Alignment Pipeline

Stage 1: Supervised Fine-tuning

Stage 2: Direct Preference Optimization

Training hyperparameters

Training results

Framework versions

📚 Citation

Model tree for Magpie-Align/MagpieLM-8B-Chat-v0.1

Datasets used to train Magpie-Align/MagpieLM-8B-Chat-v0.1

Spaces using Magpie-Align/MagpieLM-8B-Chat-v0.1 4

Collection including Magpie-Align/MagpieLM-8B-Chat-v0.1

MagpieLM

Evaluation results