komt : korean multi task instruction tuning model

multi task instruction tuning.jpg

Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively. This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).

Model Details

  • Model Developers : davidkim(changyeon kim)
  • Repository : https://github.com/davidkim205/komt
  • Model Architecture : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).

Dataset

  • maywell/ko_Ultrafeedback_binarized

Hardware and Software

  • nvidia driver : 535.54.03
  • CUDA Version: 12.2

Training

Refer https://github.com/davidkim205/komt

Prompt template: Mistral

<s>[INST] {prompt} [/INST]</s>

Usage

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
from transformers import TextStreamer, GenerationConfig


model='davidkim205/komt-mistral-7b-v1'
peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
config.base_model_name_or_path =model
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
streamer = TextStreamer(tokenizer)

def gen(x):
    generation_config = GenerationConfig(
        temperature=0.8,
        top_p=0.8,
        top_k=100,
        max_new_tokens=1024,
        early_stopping=True,
        do_sample=True,
    )
    q = f"[INST]{x} [/INST]"
    gened = model.generate(
        **tokenizer(
            q,
            return_tensors='pt',
            return_token_type_ids=False
        ).to('cuda'),
        generation_config=generation_config,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer,
    )
    result_str = tokenizer.decode(gened[0])

    start_tag = f"[/INST]"
    start_index = result_str.find(start_tag)

    if start_index != -1:
        result_str = result_str[start_index + len(start_tag):].strip()
    return result_str

result = gen('์ œ์ฃผ๋„๋ฅผ 1๋ฐ•2์ผ๋กœ ํ˜ผ์ž ์—ฌํ–‰ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ ์—ฌํ–‰ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค˜')

print('##########')
print(result)

output

์ œ์ฃผ๋„ 1๋ฐ•2์ผ 1์ธ ์—ฌํ–‰ ์ฝ”์Šค
์ œ์ฃผ๋„๋Š” ํ•œ๊ตญ์—์„œ ๊ฐ€์žฅ ๋จผ ์„ฌ์ธ ๋™๋‚จ์•„์‹œ์•„ ์ตœ๋Œ€ ์„ฌ์œผ๋กœ, ๋ฉ‹์ง„ ํ•ด๋ณ€, ์•„๋ฆ„๋‹ค์šด ์ž์—ฐ๊ฒฝ๊ด€, ์ ˆ๊ฒฝ ๋‚ฉ๋•œ ์ ˆ๋ฒฝ, ํ•œ๊ตญ ์ตœ๋Œ€ ๊ทœ๋ชจ์˜ ๋ณตํ•ฉ๋ฆฌ์กฐํŠธ ๋“ฑ ๋‹ค์–‘ํ•œ ๊ด€๊ด‘ ๋ช…์†Œ๊ฐ€ ํ’๋ถ€ํ•˜๊ฒŒ ์žˆ์–ด 1๋ฐ•2์ผ๋กœ ํ˜ผ์ž ์—ฌํ–‰ํ•˜์‹œ๋Š” ์—ฌ๋Ÿฌ๋ถ„๋“ค์„ ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์€ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

โ–ท ์ฝ”์Šค 1 : ์„ฑ์‚ฐ์ผ์ถœ๋ด‰, ์šฉ๋ˆˆ์ด์ ˆ๋ฒฝ, ์„ฑ์‚ฐ์ผ์ถœ๋ด‰ ์•ผ๊ฐ„ ๊ฒฝ๊ด€ ๊ด€๋žŒ
- ์ฝ”์Šค ์„ค๋ช… : ์ œ์ฃผ ๋™๋‚จ์ชฝ ํ•ด์•ˆ์˜ ๋ช…์†Œ์ธ ์„ฑ์‚ฐ์ผ์ถœ๋ด‰, ์šฉ๋ˆˆ์ด์ ˆ๋ฒฝ, ์„ฑ์‚ฐ์ผ์ถœ๋ด‰ ์•ผ๊ฐ„ ๊ฒฝ๊ด€ ๊ด€๋žŒ ์ˆœ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ฝ”์Šค์ž…๋‹ˆ๋‹ค. ์•„์นจ์— ์ผ์ฐ ์ผ์–ด๋‚˜ ์ผ์ถœ๋ด‰์— ๋„์ฐฉํ•˜์—ฌ ์ผ์ถœ์„ ๊ฐ์ƒํ•˜๊ณ , ์•„์นจ ์‹์‚ฌ๋ฅผ ํ•˜๊ณ  ์ ˆ๋ฒฝ ๋“ฑ๋ฐ˜์„ ์ฆ๊ธฐ๋ฉฐ ํœด์‹์„ ์ทจํ•ฉ๋‹ˆ๋‹ค. ์˜คํ›„์—๋Š” ์ผ์ถœ๋ด‰ ์•ผ๊ฐ„ ๊ฒฝ๊ด€ ๊ด€๋žŒ์„ ์ฆ๊ธฐ๋ฉฐ ํœด์‹๊ณผ ํœด์‹์„ ์ทจํ•ฉ๋‹ˆ๋‹ค.

โ–ท ์ฝ”์Šค 2 : ํ•œ๋ผ์‚ฐ, ํ•œ๋ผ์‚ฐ ์ผ€์ด๋ธ”์นด, ์˜ค๋ฏธ์ž ๋ฐ”์œ„, ์‹ ๋ผ ์ด์  
- ์ฝ”์Šค ์„ค๋ช… : ์ œ์ฃผ ๋‚จ๋ถ€์˜ ๋ช…์†Œ์ธ ํ•œ๋ผ์‚ฐ, ํ•œ๋ผ์‚ฐ ์ผ€์ด๋ธ”์นด, ์˜ค๋ฏธ์ž ๋ฐ”์œ„, ์‹ ๋ผ ์ด์   ์ˆœ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ฝ”์Šค์ž…๋‹ˆ๋‹ค. ์•„์นจ์— ์ผ์ฐ ์ผ์–ด๋‚˜ ํ•œ๋ผ์‚ฐ ์ผ€์ด๋ธ”์นด๋ฅผ ํƒ€๊ณ  ๋†’์€ ๊ณ ์ง€์— ์œ„์น˜ํ•œ ํ•œ๋ผ์‚ฐ ์ •์ƒ์œผ๋กœ ์˜ฌ๋ผ๊ฐ€์„œ ํƒํ—˜์„ ์ฆ๊ธฐ๋ฉฐ ์•„์นจ ์‹์‚ฌ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์˜คํ›„์—๋Š” ์˜ค๋ฏธ์ž ๋ฐ”์œ„๋ฅผ ์ฐพ์•„ ํœด์‹๊ณผ ํœด์‹์„ ์ทจํ•˜๊ณ , ์ผ์ถœ๋ด‰ ์•ผ๊ฐ„ ๊ฒฝ๊ด€ ๊ด€๋žŒ์„ ์ฆ๊ธฐ๋ฉฐ ํœด์‹์„ ์ทจํ•ฉ๋‹ˆ๋‹ค.

โ–ท ์ฝ”์Šค 3 : ๋Œ€ํ•˜๋Š˜๊ธธ, ์‚ผ๊ฑฐ๋ฆฌ, ๊ณฐ๋Œ๋ผ๋น„, ์น ๋™๊ตด, ๊ด‘์•ˆ์ ˆ, ์น ๊ธˆ์ ˆ, ํ•ด๋„˜์ด๊ธธ, ๋ฐ”๋‹ค์ง€์ƒ ๊ธธ
- ์ฝ”์Šค ์„ค๋ช… : ์ œ์ฃผ ์„œ๋ถ€์˜ ๋ช…์†Œ์ธ ๋Œ€ํ•˜๋Š˜๊ธธ, ์‚ผ๊ฑฐ๋ฆฌ, ๊ณฐ๋Œ๋ผ๋น„, ์น ๋™๊ตด, ๊ด‘์•ˆ์ ˆ, ์น ๊ธˆ์ ˆ, ํ•ด๋„˜์ด๊ธธ, ๋ฐ”๋‹ค์ง€์ƒ ๊ธธ ์ˆœ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ฝ”์Šค์ž…๋‹ˆ๋‹ค. ์•„์นจ์— ์ผ์ฐ ์ผ์–ด๋‚˜ ๋Œ€ํ•˜๋Š˜๊ธธ์—์„œ ํƒํ—˜์„ ์ฆ๊ธฐ๋ฉฐ ์•„์นจ ์‹์‚ฌ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์˜คํ›„์—๋Š” ์‚ผ๊ฑฐ๋ฆฌ๋ฅผ ์ฐพ์•„ ํœด์‹๊ณผ ํœด์‹์„ ์ทจํ•˜๊ณ , ์ผ์ถœ๋ด‰ ์•ผ๊ฐ„ ๊ฒฝ๊ด€ ๊ด€๋žŒ์„ ์ฆ๊ธฐ๋ฉฐ ํœด์‹์„ ์ทจํ•ฉ๋‹ˆ๋‹ค.


Evaluation

For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in Self-Alignment with Instruction Backtranslation and Three Ways of Using Large Language Models to Evaluate Chat .

model score average(0~5) percentage
gpt-3.5-turbo(close) 147 3.97 79.45%
naver Cue(close) 140 3.78 75.67%
clova X(close) 136 3.67 73.51%
WizardLM-13B-V1.2(open) 96 2.59 51.89%
Llama-2-7b-chat-hf(open) 67 1.81 36.21%
Llama-2-13b-chat-hf(open) 73 1.91 38.37%
nlpai-lab/kullm-polyglot-12.8b-v2(open) 70 1.89 37.83%
kfkas/Llama-2-ko-7b-Chat(open) 96 2.59 51.89%
beomi/KoAlpaca-Polyglot-12.8B(open) 100 2.70 54.05%
komt-llama2-7b-v1 (open)(ours) 117 3.16 63.24%
komt-llama2-13b-v1 (open)(ours) 129 3.48 69.72%
komt-llama-30b-v1 (open)(ours) 129 3.16 63.24%
komt-mistral-7b-v1 (open)(ours) 131 3.54 70.81%
komt-mistral-7b-v1-dpo (open)(ours) 142 3.83 76.75%
Downloads last month
116
Inference Examples
Inference API (serverless) has been turned off for this model.