Syed-Hasan-8503's picture
Update README.md
c7248ad verified
|
raw
history blame
2.62 kB
---
license: cc-by-nc-4.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- generated_from_trainer
- classification
- Transformer-heads
- finetune
- chatml
- gpt4
- synthetic data
- distillation
model-index:
- name: Mistral_classification_head_qlora
results: []
datasets:
- dair-ai/emotion
language:
- en
library_name: transformers
pipeline_tag: text-generation
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Mistral_classification_head_qlora
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/qna1wMB7CLTe7lfpRy5x3.png)
Mistral_classification_head_qlora has a new transformer head attached to it for sequence classification task and then resulting model has been finetuned on [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion)
dataset using QloRA. The model has been trained for 1 epoch on 1x A40 GPU. The evaluation loss for the **emotion-head-3** attached to it was **1.313**. The base model used was
* **[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)**
This experiment was performed using **[Transformer-heads library](https://github.com/center-for-humans-and-machines/transformer-heads/tree/main)**
### Training Script
The training script for attaching a new transformer head for classification task using QLoRA is following:
[Training Script Colab](https://colab.research.google.com/drive/1rPaG-Q6d_CutPOlKzjsfmPvwebNg_X6i?usp=sharing)
### Evaluating the Emotion-Head-3
For evaluating the transformer head that has been attached to the base model, you can refer to the following colab notebook
[Colab Notebook for Evaluation](https://colab.research.google.com/drive/15UpNnoKJIWjG3G_WJFOQebjpUWyNoPKT?usp=sharing)
### Training hyperparameters
The following hyperparameters were used during training:
train_epochs = 1
eval_epochs = 1
logging_steps = 1
train_batch_size = 4
eval_batch_size = 4
* output_dir="emotion_linear_probe",
* learning_rate=0.00002,
* num_train_epochs=train_epochs,
* logging_steps=logging_steps,
* do_eval=False,
* remove_unused_columns=False,
* optim="paged_adamw_32bit",
* gradient_checkpointing=True,
* lr_scheduler_type="constant",
* ddp_find_unused_parameters=False,
* per_device_train_batch_size=train_batch_size,
* per_device_eval_batch_size=eval_batch_size,
* report_to=["wandb"]
### Framework versions
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu118
- Datasets 2.17.0
- Tokenizers 0.15.0
- Transfomer-heads