70e85f3a-7c9c-4420-9661-d30a337d8a33

Model Card for Model ID

LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 targeting all the modules.

Training Hyperparameters

  • Training regime:
quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = prepare_model_for_kbit_training(model)

config = LoraConfig(r = 4, 
                    lora_alpha=4, 
                    target_modules = ["gate", "q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"], 
                    lora_dropout=0.1
                    )

lora_model = get_peft_model(model, config)

lora_model.print_trainable_parameters()

dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train")

trainer = SFTTrainer(
    model = lora_model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    packing = True,
    args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 16,
        group_by_length = True,
        warmup_steps = 5,
        bf16 = True,
        max_steps=10000,
        learning_rate = 2e-4,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        eval_strategy="no",
        do_eval=False,
        output_dir = "./outputs",
        push_to_hub=True,
        remove_unused_columns=False,
    )
)

torch.cuda.empty_cache()

trainer.train()

Metrics and results:

Upcoming.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Downloads last month
8
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Finetuned
(45)
this model
Quantizations
2 models

Dataset used to train Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Collection including Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA