Na0s
/

Mixtral-8x7B-Instruct-v0.1-LoRA-on-Gates

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mixtral-8x7B-Instruct-v0.1-LoRA-on-Gates / README.md

Na0s's picture

Update README.md

bd14a7c verified 4 months ago

|

history blame contribute delete

2.68 kB

	---
	library_name: transformers
	datasets:
	- Na0s/sft-ready-Text-Generation-Augmented-Data
	language:
	- en
	base_model:
	- mistralai/Mixtral-8x7B-Instruct-v0.1
	pipeline_tag: text-generation
	---


	<a href="https://ibb.co/G5j5XNh"><img src="https://i.ibb.co/2kBkwHb/photo-model.webp" alt="photo-model" border="0"></a>



	# Model Card for Model ID

	LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 only targeting the gate/router.



	#### Training Hyperparameters

	- Training regime:

	```python
	quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True)

	tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right")
	model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config)
	tokenizer.add_special_tokens({'pad_token': '[PAD]'})

	model = prepare_model_for_kbit_training(model)

	config = LoraConfig(r = 4,
	lora_alpha=4,
	target_modules = ["gate"],
	lora_dropout=0.1
	)

	lora_model = get_peft_model(model, config)

	lora_model.print_trainable_parameters()

	dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train")

	trainer = SFTTrainer(
	model = lora_model,
	tokenizer = tokenizer,
	train_dataset = dataset,
	packing = True,
	args = TrainingArguments(
	per_device_train_batch_size = 1,
	gradient_accumulation_steps = 16,
	group_by_length = True,
	warmup_steps = 5,
	bf16 = True,
	max_steps=5000,
	learning_rate = 2e-4,
	optim = "adamw_8bit",
	weight_decay = 0.01,
	lr_scheduler_type = "cosine",
	seed = 3407,
	eval_strategy="no",
	do_eval=False,
	output_dir = "./outputs",
	push_to_hub=True,
	remove_unused_columns=False,
	)
	)
	```






	#### Metrics and results:

	Upcoming.

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).



	## Technical Specifications

	### Model Architecture and Objective

	The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: [A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts](https://arxiv.org/abs/2405.16646)