Depth pruned and fine tuned Llama-3.1-8B
Collection
5 items
•
Updated
> alt="Model-card-peft-lora-1.0" align="center">
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0.05,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "completion",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False,
args = TrainingArguments(
per_device_train_batch_size = 6,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps=5000,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs_2",
push_to_hub=True,
hub_always_push=True,
),
)
Dataset: Berkeley-nest/Nectar
[berkeley-nest/Nectar]
MMLU Pro 0-shot: 0.2927
[TIGER-AI-Lab/MMLU-Pro]
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Base model
Na0s/Llama-3.1-8B-Pruned-4-Layers