|
--- |
|
library_name: transformers |
|
datasets: |
|
- Na0s/sft-ready-Text-Generation-Augmented-Data |
|
language: |
|
- en |
|
base_model: |
|
- mistralai/Mixtral-8x7B-Instruct-v0.1 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
<a href="https://ibb.co/G5j5XNh"><img src="https://i.ibb.co/2kBkwHb/photo-model.webp" alt="photo-model" border="0"></a> |
|
|
|
|
|
|
|
# Model Card for Model ID |
|
|
|
LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 only targeting the gate/router. |
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** |
|
|
|
```python |
|
quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right") |
|
model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config) |
|
tokenizer.add_special_tokens({'pad_token': '[PAD]'}) |
|
|
|
model = prepare_model_for_kbit_training(model) |
|
|
|
config = LoraConfig(r = 4, |
|
lora_alpha=4, |
|
target_modules = ["gate"], |
|
lora_dropout=0.1 |
|
) |
|
|
|
lora_model = get_peft_model(model, config) |
|
|
|
lora_model.print_trainable_parameters() |
|
|
|
dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train") |
|
|
|
trainer = SFTTrainer( |
|
model = lora_model, |
|
tokenizer = tokenizer, |
|
train_dataset = dataset, |
|
packing = True, |
|
args = TrainingArguments( |
|
per_device_train_batch_size = 1, |
|
gradient_accumulation_steps = 16, |
|
group_by_length = True, |
|
warmup_steps = 5, |
|
bf16 = True, |
|
max_steps=5000, |
|
learning_rate = 2e-4, |
|
optim = "adamw_8bit", |
|
weight_decay = 0.01, |
|
lr_scheduler_type = "cosine", |
|
seed = 3407, |
|
eval_strategy="no", |
|
do_eval=False, |
|
output_dir = "./outputs", |
|
push_to_hub=True, |
|
remove_unused_columns=False, |
|
) |
|
) |
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Metrics and results: |
|
|
|
Upcoming. |
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
|
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: [A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts](https://arxiv.org/abs/2405.16646) |
|
|
|
|
|
|