|
--- |
|
license: cc-by-nc-4.0 |
|
base_model: mistralai/Mistral-7B-Instruct-v0.2 |
|
tags: |
|
- generated_from_trainer |
|
- classification |
|
- Transformer-heads |
|
- finetune |
|
- chatml |
|
- gpt4 |
|
- synthetic data |
|
- distillation |
|
model-index: |
|
- name: Mistral_classification_head_qlora |
|
results: [] |
|
datasets: |
|
- dair-ai/emotion |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# Mistral_classification_head_qlora |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/qna1wMB7CLTe7lfpRy5x3.png) |
|
|
|
Mistral_classification_head_qlora has a new transformer head attached to it for sequence classification task and then resulting model has been finetuned on [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) |
|
dataset using QloRA. The model has been trained for 1 epoch on 1x A40 GPU. The evaluation loss for the **emotion-head-3** attached to it was **1.313**. The base model used was |
|
|
|
* **[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)** |
|
|
|
This experiment was performed using **[Transformer-heads library](https://github.com/center-for-humans-and-machines/transformer-heads/tree/main)** |
|
|
|
### Training Script |
|
|
|
The training script for attaching a new transformer head for classification task using QLoRA is following: |
|
|
|
[Training Script Colab](https://colab.research.google.com/drive/1rPaG-Q6d_CutPOlKzjsfmPvwebNg_X6i?usp=sharing) |
|
|
|
|
|
### Evaluating the Emotion-Head-3 |
|
|
|
For evaluating the transformer head that has been attached to the base model, you can refer to the following colab notebook |
|
[Colab Notebook for Evaluation](https://colab.research.google.com/drive/15UpNnoKJIWjG3G_WJFOQebjpUWyNoPKT?usp=sharing) |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
train_epochs = 1 |
|
eval_epochs = 1 |
|
logging_steps = 1 |
|
train_batch_size = 4 |
|
eval_batch_size = 4 |
|
|
|
* output_dir="emotion_linear_probe", |
|
* learning_rate=0.00002, |
|
* num_train_epochs=train_epochs, |
|
* logging_steps=logging_steps, |
|
* do_eval=False, |
|
* remove_unused_columns=False, |
|
* optim="paged_adamw_32bit", |
|
* gradient_checkpointing=True, |
|
* lr_scheduler_type="constant", |
|
* ddp_find_unused_parameters=False, |
|
* per_device_train_batch_size=train_batch_size, |
|
* per_device_eval_batch_size=eval_batch_size, |
|
* report_to=["wandb"] |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.39.0.dev0 |
|
- Pytorch 2.1.2+cu118 |
|
- Datasets 2.17.0 |
|
- Tokenizers 0.15.0 |
|
- Transfomer-heads |
|
|