This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.

The project is running to make a small LLM for a on-device purpose.

Overall pipeline for this iteration is

1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.

This model which is not pruned is intended to compare with the pruned model.

This is the code and parameters I chose for this model(DPO).

from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer

dpo_trainer = DPOTrainer(
    model = model,
   
    ref_model = None,
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 8,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "output_DPO",
    ),
    beta = 0.1,
    train_dataset = dataset,
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,
)

The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing

Benchmark Scores

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.6894 ± 0.0135
none 0 acc_norm 0.6860 ± 0.0136
Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.7092 ± 0.0045
none 0 acc_norm 0.8736 ± 0.0033
Tasks Version Filter n-shot Metric Value Stderr
truthfulqa_mc2 2 none 0 acc 0.7126 ± 0.015
Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.6225 ± 0.1292
- humanities N/A none 0 acc 0.5745 ± 0.1286
- other N/A none 0 acc 0.6952 ± 0.1095
- social_sciences N/A none 0 acc 0.7280 ± 0.0735
- stem N/A none 0 acc 0.5195 ± 0.1313
Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 0 acc 0.824 ± 0.0107
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 2 get-answer 5 exact_match 0.7263 ± 0.0123

Average = 74.08

Downloads last month
13
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hflog/alnrg2arg-test3_sft_16bit_dpo2

Finetuned
(8)
this model

Dataset used to train hflog/alnrg2arg-test3_sft_16bit_dpo2