--- library_name: transformers datasets: - Na0s/sft-ready-Text-Generation-Augmented-Data language: - en base_model: - mistralai/Mixtral-8x7B-Instruct-v0.1 pipeline_tag: text-generation --- photo-model # Model Card for Model ID LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 only targeting the gate/router. #### Training Hyperparameters - **Training regime:** ```python quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right") model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config) tokenizer.add_special_tokens({'pad_token': '[PAD]'}) model = prepare_model_for_kbit_training(model) config = LoraConfig(r = 4, lora_alpha=4, target_modules = ["gate"], lora_dropout=0.1 ) lora_model = get_peft_model(model, config) lora_model.print_trainable_parameters() dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train") trainer = SFTTrainer( model = lora_model, tokenizer = tokenizer, train_dataset = dataset, packing = True, args = TrainingArguments( per_device_train_batch_size = 1, gradient_accumulation_steps = 16, group_by_length = True, warmup_steps = 5, bf16 = True, max_steps=5000, learning_rate = 2e-4, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "cosine", seed = 3407, eval_strategy="no", do_eval=False, output_dir = "./outputs", push_to_hub=True, remove_unused_columns=False, ) ) ``` #### Metrics and results: Upcoming. ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). ## Technical Specifications ### Model Architecture and Objective The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: [A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts](https://arxiv.org/abs/2405.16646)