davidkim205
commited on
Commit
β’
0a95e86
1
Parent(s):
873671c
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- ko
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
inference: false
|
7 |
+
tags:
|
8 |
+
- facebook
|
9 |
+
- meta
|
10 |
+
- pytorch
|
11 |
+
- llama
|
12 |
+
- llama-2
|
13 |
+
- llama-2-chat
|
14 |
+
library_name: peft
|
15 |
+
---
|
16 |
+
# komt : korean multi task instruction tuning model
|
17 |
+
![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)
|
18 |
+
|
19 |
+
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
|
20 |
+
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
|
21 |
+
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
|
22 |
+
|
23 |
+
## Model Details
|
24 |
+
|
25 |
+
* **Model Developers** : davidkim(changyeon kim)
|
26 |
+
* **Repository** : https://github.com/davidkim205/komt
|
27 |
+
* **Model Architecture** : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).
|
28 |
+
|
29 |
+
|
30 |
+
## Dataset
|
31 |
+
* maywell/ko_Ultrafeedback_binarized
|
32 |
+
https://huggingface.co/datasets/maywell/ko_Ultrafeedback_binarized
|
33 |
+
|
34 |
+
## Hardware and Software
|
35 |
+
- nvidia driver : 535.54.03
|
36 |
+
- CUDA Version: 12.2
|
37 |
+
|
38 |
+
## Training
|
39 |
+
Refer https://github.com/davidkim205/komt
|
40 |
+
|
41 |
+
## Prompt template: Mistral
|
42 |
+
```
|
43 |
+
<s>[INST] {prompt} [/INST]</s>
|
44 |
+
```
|
45 |
+
|
46 |
+
## Usage
|
47 |
+
```
|
48 |
+
import torch
|
49 |
+
|
50 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
51 |
+
from peft import PeftModel, PeftConfig
|
52 |
+
from transformers import TextStreamer, GenerationConfig
|
53 |
+
|
54 |
+
|
55 |
+
model='davidkim205/komt-mistral-7b-v1'
|
56 |
+
peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
|
57 |
+
config = PeftConfig.from_pretrained(peft_model_name)
|
58 |
+
bnb_config = BitsAndBytesConfig(
|
59 |
+
load_in_4bit=True,
|
60 |
+
bnb_4bit_use_double_quant=True,
|
61 |
+
bnb_4bit_quant_type="nf4",
|
62 |
+
bnb_4bit_compute_dtype=torch.bfloat16
|
63 |
+
)
|
64 |
+
config.base_model_name_or_path =model
|
65 |
+
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
|
66 |
+
model = PeftModel.from_pretrained(model, peft_model_name)
|
67 |
+
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
68 |
+
streamer = TextStreamer(tokenizer)
|
69 |
+
|
70 |
+
def gen(x):
|
71 |
+
generation_config = GenerationConfig(
|
72 |
+
temperature=0.8,
|
73 |
+
top_p=0.8,
|
74 |
+
top_k=100,
|
75 |
+
max_new_tokens=1024,
|
76 |
+
early_stopping=True,
|
77 |
+
do_sample=True,
|
78 |
+
)
|
79 |
+
q = f"[INST]{x} [/INST]"
|
80 |
+
gened = model.generate(
|
81 |
+
**tokenizer(
|
82 |
+
q,
|
83 |
+
return_tensors='pt',
|
84 |
+
return_token_type_ids=False
|
85 |
+
).to('cuda'),
|
86 |
+
generation_config=generation_config,
|
87 |
+
pad_token_id=tokenizer.eos_token_id,
|
88 |
+
eos_token_id=tokenizer.eos_token_id,
|
89 |
+
streamer=streamer,
|
90 |
+
)
|
91 |
+
result_str = tokenizer.decode(gened[0])
|
92 |
+
|
93 |
+
start_tag = f"[/INST]"
|
94 |
+
start_index = result_str.find(start_tag)
|
95 |
+
|
96 |
+
if start_index != -1:
|
97 |
+
result_str = result_str[start_index + len(start_tag):].strip()
|
98 |
+
return result_str
|
99 |
+
|
100 |
+
result = gen('μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€')
|
101 |
+
|
102 |
+
print('##########')
|
103 |
+
print(result)
|
104 |
+
```
|
105 |
+
output
|
106 |
+
```
|
107 |
+
<s> [INST]μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€ [/INST]μ μ£Όλμ μλ¦λ€μ΄ μμ° κ²½κ΄κ³Ό λ¬Ένμ μ°μ λλΌκ³ μΆλ€λ©΄ λ€μκ³Ό κ°μ μ½μ€λ₯Ό μΆμ²νλ€.
|
108 |
+
|
109 |
+
첫 λ²μ§Έ μ½μ€λ μλ¦λ€μ΄ ν΄μ κ²½κ΄κ³Ό ν΄μμ¬κ΅¬λ₯Ό κ°μνκΈ° μν΄ μ μ£Ό μλΆ ν΄μμΌλ‘ μ΄λνλ κ²μ΄λ€. μ μ£Ό μλ΄μμ μΌμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ νλ¦Όν΄μμμ₯, μ±μ°ν΄μμμ₯, λ΄λν΄μμμ₯ λ± μ λͺ
ν ν΄μμμ₯μ κ²½μ ν μ μλ€. μ΄ μ§μμ λ§μ λ°λ€μ λμ ν΄μμμ ν΄μμμ μ¦κΈΈ μ μμΌλ©°, ν΄μμμ₯ μ£Όλ³μλ λ§μ μμμ μ΄ μμ΄ λ°°μμ μ¦κΈΈ μ μλ€. μμͺ½ ν΄μμΌλ‘ μ΄λνλ λμ μ μ£Ό λν μ¬κ³μ λ§μ§μΈ νλΈ μν λ° μλΌλΉ λ± λ§μλ μμμ λ§λ³Ό μ μλ€. μλΆ ν΄μμ λμ λ€μ μ μ£Ό μλ΄λ‘ λμμ€λ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€.
|
110 |
+
|
111 |
+
λ λ²μ§Έ μ½μ€λ λλΆ ν΄μμ λμ보λ κ²μ΄λ€. μ μ£Ό μλ΄μμ μ€λ₯Έμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ μμ΄μ€ν¬λ¦Ό κ±°λ¦¬μΈ νλ¦Όν΄μμμ₯, μ±μ°ν΄μμμ₯, λ΄λν΄μμμ₯ λ± λ€μ ν λ² μ λͺ
ν ν΄μμμ₯μ κ²½μ ν μ μλ€. μ΄ μ§μμ ν΄μμμ₯ μ£Όλ³μλ λ§μ μμμ μ΄ μμ΄ λ°°μμ μ¦κΈΈ μ μλ€. λλΆ ν΄μμ λμ λ€μ μ μ£Ό μλ΄λ‘ λμμ€λ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€. μ΄ μ§μμλ λ§μ μμμ μ΄ μμ΄ λ§μλ μμμ λ§λ³Ό μ μλ€.
|
112 |
+
|
113 |
+
μΈ λ²μ§Έ μ½μ€λ μ μ£Ό λ¨λΆλ‘ μ΄λνλ κ²μ΄λ€. μ μ£Ό μλ΄μμ μ€λ₯Έμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ μ μ£Ό λ¨λΆλ‘ μ΄λν μ μλ€. μ΄ μ§μμ νλΌμ° κ΅λ¦½κ³΅μμ΄ μμΉν΄ μμ΄ μμ° κ²½κ΄μ κ°μν μ μλ€. νλΌμ° κ΅λ¦½κ³΅μ λ΄μλ λ€μν μμ° κ²½κ΄κ³Ό μ°μ
κ²½λ‘λ₯Ό μ¦κΈΈ μ μλ νλ°© μ½μ€κ° μλ€. λν, μ μ£Ό λ¨λΆλ λ§μ ν΄μμμ₯κ³Ό 골νμ₯μ΄ μμΉν΄ μμ΄ ν΄μμκ³Ό 골νλ₯Ό μ¦κΈΈ μ μλ€. λ¨λΆλ‘ μ΄λνλ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€.
|
114 |
+
|
115 |
+
|
116 |
+
```
|
117 |
+
## Evaluation
|
118 |
+
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
|
119 |
+
|
120 |
+
|
121 |
+
| model | score | average(0~5) | percentage |
|
122 |
+
|------------------------------------------|---------| ------------ |------------|
|
123 |
+
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
|
124 |
+
| naver Cue(close) | 140 | 3.78 | 75.67% |
|
125 |
+
| clova X(close) | 136 | 3.67 | 73.51% |
|
126 |
+
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
|
127 |
+
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
|
128 |
+
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
|
129 |
+
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
|
130 |
+
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
|
131 |
+
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
|
132 |
+
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
|
133 |
+
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
|
134 |
+
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
|
135 |
+
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
|
136 |
+
| **komt-mistral-7b-v1-dpo (open)(ours)** | **142** | **3.83** | **76.75%** |
|