llama-3-sauce-v2-8B

This model is based on Llama-3-8b, and is governed by META LLAMA 3 COMMUNITY LICENSE AGREEMENT

This is a bad finetune on nbeerbower/llama-3-spicy-abliterated-stella-8B using various DPO sets.

Chat Format

Please use the ChatML format or you may experience poor results.

<|im_start|>system
{System Prompt Here!}<|im_end|>
<|im_start|>assistant
{Message from AI}<|im_end|>
<|im_start|>user
{Message from User}<|im_end|>

Method

Finetuned using an A100 on Google Colab.

Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne

Configuration

Dataset preparation:

def chatml_format(example):
    # Format system
    system = ""
    if example.get('system') and len(example['system']) > 0:
        systemMessage = example['system']
        system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n"

    # Format instruction
    prompt = "<|im_start|>user\n" + example['prompt'] + "<|im_end|>\n<|im_start|>assistant\n"

    # Format chosen answer
    chosen = example['chosen'] + "<|im_end|>\n"

    # Format rejected answer
    rejected = example['rejected'] + "<|im_end|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

# Array of datasets to concat
ds = [
    "jondurbin/truthy-dpo-v0.1",
    "jondurbin/gutenberg-dpo-v0.1",
    "flammenai/FlameMix-DPO-v1"
]

# load_dataset and combine all
loaded_datasets = [load_dataset(dataset_name, split='train') for dataset_name in ds]
dataset = concatenate_datasets(loaded_datasets)

# Save columns
original_columns = dataset.column_names

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)

LoRA, model, and training settings:

# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    learning_rate=3e-5,
    lr_scheduler_type="cosine",
    max_steps=4000,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 70.38
AI2 Reasoning Challenge (25-Shot) 65.61
HellaSwag (10-Shot) 83.11
MMLU (5-Shot) 67.98
TruthfulQA (0-shot) 56.39
Winogrande (5-shot) 76.72
GSM8k (5-shot) 72.48
Downloads last month
41
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nbeerbower/llama-3-sauce-v2-8B

Finetuned
(2)
this model
Merges
1 model
Quantizations
3 models

Datasets used to train nbeerbower/llama-3-sauce-v2-8B

Spaces using nbeerbower/llama-3-sauce-v2-8B 5

Evaluation results