leesm's picture
Update README.md
d105e03 verified
metadata
library_name: transformers
license: mit
datasets:
  - heegyu/open-korean-instructions
language:
  - ko
tags:
  - Llama-3
  - LoRA
  - MLP-KTLim/llama-3-Korean-Bllossom-8B

MLP-KTLim/llama-3-Korean-Bllossom-8B model fine tuning

(TREX-Lab at Seoul Cyber University)

Summary

  • Base Model : MLP-KTLim/llama-3-Korean-Bllossom-8B
  • Dataset : heegyu/open-korean-instructions (10%)
  • Tuning Method
    • PEFT(Parameter Efficient Fine-Tuning)
    • LoRA(Low-Rank Adaptation of Large Language Models)
  • Related Articles : https://arxiv.org/abs/2106.09685, https://arxiv.org/pdf/2403.10882
  • Fine-tuning the Base Model with a random 10% of Korean chatbot data (open Korean instructions)
  • Test whether fine tuning of a large language model is possible on A30 GPU*1 (successful)
  • Developed by: [TREX-Lab at Seoul Cyber University]
  • Language(s) (NLP): [Korean]
  • Finetuned from model : [MLP-KTLim/llama-3-Korean-Bllossom-8B]

Fine Tuning Detail

  • alpha value 16
  • r value 64 (it seems a bit big...@@)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias='none',
    task_type='CAUSAL_LM'
)
  • Mixed precision : 4bit (bnb_4bit_use_double_quant)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype='float16',
)
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=dataset,
    dataset_text_field='text',
    max_seq_length=min(tokenizer.model_max_length, 2048),
    tokenizer=tokenizer,
    packing=True,
    args=training_args
)

Train Result

time taken : executed in 21h 45m 55s
TrainOutput(global_step=816, training_loss=1.718194248045192,
            metrics={'train_runtime': 78354.6002,
                     'train_samples_per_second': 0.083,
                     'train_steps_per_second': 0.01,
                     'train_loss': 1.718194248045192,
                     'epoch': 2.99})