Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.13.2
metadata
title: >-
ERA SESSION27 - Phi2 Model Finetuning with QLoRA on OpenAssistant
Conversations Dataset (OASST1)
emoji: 💻
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 4.14.0
app_file: app.py
pinned: false
license: mit
This is an implementation of Phi2 model finetuning using QLoRA stratergy on OpenAssistant Conversations Dataset (OASST1)
Dataset used to finetune: OpenAssistant Conversations Dataset (OASST1) ChatML modified OSST Dataset: RaviNaik/oasst1-chatml Finetuned Model: RaviNaik/Phi2-Osst
Tasks:
- :heavy_check_mark: Use OpenAssistant dataset.
- :heavy_check_mark: Finetune Microsoft Phi2 model.
- :heavy_check_mark: Use QLoRA stratergy.
- :heavy_check_mark: Create an App on HF space using finetuned model.
Phi2 Model Description:
PhiForCausalLM(
(transformer): PhiModel(
(embd): Embedding(
(wte): Embedding(51200, 2560)
(drop): Dropout(p=0.0, inplace=False)
)
(h): ModuleList(
(0-31): 32 x ParallelBlock(
(ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(resid_dropout): Dropout(p=0.1, inplace=False)
(mixer): MHA(
(rotary_emb): RotaryEmbedding()
(Wqkv): Linear4bit(in_features=2560, out_features=7680, bias=True)
(out_proj): Linear4bit(in_features=2560, out_features=2560, bias=True)
(inner_attn): SelfAttention(
(drop): Dropout(p=0.0, inplace=False)
)
(inner_cross_attn): CrossAttention(
(drop): Dropout(p=0.0, inplace=False)
)
)
(mlp): MLP(
(fc1): Linear4bit(in_features=2560, out_features=10240, bias=True)
(fc2): Linear4bit(in_features=10240, out_features=2560, bias=True)
(act): NewGELUActivation()
)
)
)
)
(lm_head): CausalLMHead(
(ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(linear): Linear(in_features=2560, out_features=51200, bias=True)
)
(loss): CausalLMLoss(
(loss_fct): CrossEntropyLoss()
)
)
Training Loss Curve:
Training Output
TrainOutput(global_step=500, training_loss=1.4746462078094482, metrics={'train_runtime': 4307.6684, 'train_samples_per_second': 3.714, 'train_steps_per_second': 0.116, 'total_flos': 6.667526640623616e+16, 'train_loss': 1.4746462078094482, 'epoch': 1.62})