File size: 14,884 Bytes
855e324
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
Quantization made by Richard Erkhov.

[Github](https://github.com/RichardErkhov)

[Discord](https://discord.gg/pvy7H8DZMG)

[Request more models](https://github.com/RichardErkhov/quant_request)


ko-gemma-2-9b-it - bnb 4bits
- Model creator: https://huggingface.co/rtzr/
- Original model: https://huggingface.co/rtzr/ko-gemma-2-9b-it/




Original model description:
---
license: gemma
library_name: transformers

pipeline_tag: text-generation
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
  To access Gemma on Hugging Face, youโ€™re required to review and agree to
  Googleโ€™s usage license. To do this, please ensure youโ€™re logged in to Hugging
  Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license

tags:

- conversational

base_model:
- google/gemma-2-9b
language:
- ko
---



## Model Details

### Ko-Gemma-2-9B-IT

**[Ko-Gemma-2-9B-IT](https://huggingface.co/rtzr/ko-gemma-2-9b-it)** is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) training specifically for Human Feedback. The datasets include:

- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-korean-dpo-pairs)
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)

Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.

#### *Inputs and outputs*

- **Input:** Text string, such as a question, a prompt, or a document to be summarized.
- **Output:** Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.

### Google Gemma 2

Gemma is a family of lightweight, state-of-the-art open models from Google,
built from the same research and technology used to create the Gemini models.
They are text-to-text, decoder-only large language models, available in English,
with open weights for both pre-trained variants and instruction-tuned variants.
Gemma models are well-suited for a variety of text generation tasks, including
question answering, summarization, and reasoning. Their relatively small size
makes it possible to deploy them in environments with limited resources such as
a laptop, desktop or your own cloud infrastructure, democratizing access to
state of the art AI models and helping foster innovation for everyone.

## Benchmark Scores

We evaluated it internally using [LogicKor](https://github.com/instructkr/LogicKor) code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.

| Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
|:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
| [rtzr/ko-gemma-2-9b-it](https://huggingface.co/rtzr/ko-gemma-2-9b-it) | 8.71 / 8.00  | 9.14 / 8.00 | 9.43 / 9.29 | 9.00 / 9.43 | 9.57 / 9.86 | 7.14 / 5.00 | 8.83 | 8.26 | 8.55  |
| [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 8.57 / 7.71  | 8.86 / 7.00 | 9.29 / 9.29 | 9.29 / 9.57 | 8.57 / 8.29 | 6.86 / 3.86 | 8.57  | 7.62  | 8.10 |
| [MLP-KTLim/llama-3-Korean-Bllossom-8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B) | 6.43 / 5.71  | 6.86 / 5.14 | 9.14 / 8.57 | 8.29 / 8.14 | 8.43 / 9.29  | 5.71 / 5.29 | 7.48 | 7.02 | 7.25 |
| [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) | 5.57 / 4.29  | 8.14 / 5.14 | 8.29 / 6.29 | 6.43 / 7.86 | 9.29 / 8.57  | 6.57 / 3.71 | 7.38 | 5.98 | 6.68 |
| [allganize/Llama-3-Alpha-Ko-8B-Instruct](https://huggingface.co/allganize/Llama-3-Alpha-Ko-8B-Instruct) | 4.57 / 3.00  | 6.86 / 6.43 | 7.43 / 6.71 | 8.43 / 8.43| 7.71 / 8.71 | 6.71 / 4.43 | 6.95 | 6.29  | 6.62  |

## Usage

### Install Dependencies

You must install transformers >= 4.42.3 for gemma2 models.

```bash

pip install transformers==4.42.3 accelerate

```

### Python code with Pipeline

```python

import transformers

import torch





model_id = "rtzr/ko-gemma-2-9b-it"



pipeline = transformers.pipeline(

    "text-generation",

    model=model_id,

    model_kwargs={"torch_dtype": torch.bfloat16},

    device_map="auto",

)



pipeline.model.eval()

instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"



messages = [

    {"role": "user", "content": f"{instruction}"}

]



prompt = pipeline.tokenizer.apply_chat_template(

    messages, 

    tokenize=False, 

    add_generation_prompt=True

)



terminators = [

    pipeline.tokenizer.eos_token_id,

    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")

]



outputs = pipeline(

    prompt,

    max_new_tokens=2048,

    eos_token_id=terminators,

    do_sample=True,

    temperature=0.6,

    top_p=0.9,

)



print(outputs[0]["generated_text"][len(prompt):])

```

```markdown

์„œ์šธ์€ ์—ญ์‚ฌ, ๋ฌธํ™”, ํ˜„๋Œ€์„ฑ์ด ์กฐํ™”๋ฅผ ์ด๋ฃฌ ๋งค๋ ฅ์ ์ธ ๋„์‹œ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ด€๊ด‘์ง€์™€ ๋ช…์†Œ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค 3๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.



**1. ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ํ•œ๊ตญ๊ด€๊ด‘์ฝ”์Šค**



1. **๊ฒฝ๋ณต๊ถ**: ์กฐ์„  ์‹œ๋Œ€์˜ ์›…์žฅํ•œ ์™•๊ถ์„ ๋งŒ๋ฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งค๋…„ ๋ด„์— ์—ด๋ฆฌ๋Š” '์ถ˜์ถ”์—ฐํšŒ'๋Š” ๊ฒฝ๋ณต๊ถ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋”์šฑ ๋‹๋ณด์ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

2. **๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„**: ๊ณ ํ’์Šค๋Ÿฌ์šด ํ•œ์˜ฅ์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ณณ์œผ๋กœ, ์ „ํ†ต ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. '๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„ ๋ฌธํ™”์ฒดํ—˜๊ด€'์—์„œ๋Š” ํ•œ๋ณต ์ฒดํ—˜๋ถ€ํ„ฐ ์ข…์ด๋งŒํ™”, ํ•œ๊ธ€ ์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

3. **์ธ์‚ฌ๋™**: ์„œ์ , ๋ฏธ์ˆ ๊ด€, ํ•œ์‹๋‹น์ด ๋งŽ์€ ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ '์ธ์‚ฌ๋™ ๋ฌธํ™”๊ด€'์—์„œ๋Š” ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์ „์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. **๊ด‘ํ™”๋ฌธ** ๋ฐ **๋ช…๋™**: ํ˜„๋Œ€์ ์ธ ์‡ผํ•‘๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์ฆ๋น„ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๊ด‘ํ™”๋ฌธ์€ ํŠนํžˆ ์ Š์€์ด๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ์ŠคํŠธ๋ฆฌํŠธ ํŒจ์…˜์„ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ๋ฐค๊ฑฐ๋ฆฌ์—์„œ ํ™œ๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



**2. ๋„์‹œ์˜ ๋ชจ์Šต์„ ๋ฐ”๋ผ๋ณด๋Š” ๋ทฐํˆฌ์–ด ์ฝ”์Šค**



1. **๋‚จ์‚ฐํƒ€์›Œ**: ์„œ์šธ์˜ ์ƒ์ง•์ ์ธ ๊ฑด๋ฌผ๋กœ, ๊ผญ๋Œ€๊ธฐ์—์„œ ํŽผ์ณ์ง€๋Š” 360๋„์˜ ๊ฒฝ์น˜๊ฐ€ ์••๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐค์ด ๋˜๋ฉด ์กฐ๋ช…์ด ์–ด์šฐ๋Ÿฌ์ ธ ๋”์šฑ ์•„๋ฆ„๋‹ค์›Œ์ง‘๋‹ˆ๋‹ค.

2. **์„œ์šธํƒ€์›Œ**: ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋น„์Šทํ•œ ์œ„์น˜๋กœ, ๋†’์ด๊ฐ€ ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋„“์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธํƒ€์›Œ ๋‚ด๋ถ€์—๋Š” ๋‹ค์–‘ํ•œ ์ „์‹œ๊ด€๊ณผ ๋ ˆ์Šคํ† ๋ž‘๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

3. **๋ถ์•…์‚ฐ**: ์„œ์šธ์˜ ์ค‘์‹ฌ๋ถ€์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ์˜ ๊ฒฝ์น˜๋ฅผ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ถ์•…์‚ฐ ์ •์ƒ์ธ ๋ถ์•…์‚ฌ์—์„œ๋„ ์ข‹์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. **์„œ์šธ์ˆฒ**: ๋…น์ง€ ๊ณต๊ฐ„์œผ๋กœ, ๋„์‹œ์˜ ํ˜ผ์žกํ•จ์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์„œ์šธ์ˆฒ ๋‚ด๋ถ€์—๋Š” '์„œ์šธ์ˆฒ ์•„ํŠธํ”„๋ ˆ์  ํŠธ'๋ผ๋Š” ๊ณต๊ฐ„์ด ์žˆ์–ด ์˜ˆ์ˆ ๊ณผ ์ž์—ฐ์„ ํ•จ๊ป˜ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



**3. ํ˜„๋Œ€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚˜๋Š” ์ฝ”์Šค**



1. **์‚ผ์„ฑ๋™**: ํ˜„๋Œ€ ๋ฏธ์ˆ ๊ด€์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, '์‚ผ์„ฑ ๋ฏธ์ˆ ๊ด€', '์•„๋ชจ๋ฆฌ์นด๋‚˜์Šค ๊ฐค๋Ÿฌ๋ฆฌ' ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ฝ”์—‘์Šค'๋‚˜ '์•„ํฌ์นด๋กœํฌ์Šค' ๋“ฑ์˜ ๋ช…์†Œ๋„ ๊ฐ€๊นŒ์šด ๊ณณ์— ์žˆ์Šต๋‹ˆ๋‹ค.

2. **์ดํƒœ์›**: ์™ธ๊ตญ์ธ๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์™ธ๊ตญ ์Œ์‹์„ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ดํƒœ์› ๊ธ€๋กœ์ปฌ๋ฌธํ™”์„ผํ„ฐ'์—์„œ๋Š” ์„ธ๊ณ„ ๊ฐ๊ตญ์˜ ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

3. **ํ™๋Œ€**: ์ Š์€์ด๋“ค์˜ ๋ฌธํ™”๊ฐ€ ๋„˜์น˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. 'ํ™๋Œ€ ๋กค๋งํ™€'์€ ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐฉ๋ฌธํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, 'ํ™๋Œ€ ์„œ์ ๊ฑฐ๋ฆฌ'์—์„œ๋Š” ๋…์„œ์™€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. **๊ฐ•๋‚จ**: ์„œ์šธ์˜ ํ˜„๋Œ€์  ๋ชจ์Šต์„ ์ž˜ ๋ณด์—ฌ์ฃผ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. '๊ฐ•๋‚จ์—ญ'์„ ์ค‘์‹ฌ์œผ๋กœ ๋งŽ์€ ๊ณ ๊ธ‰ ์‡ผํ•‘๋ชฐ๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์žˆ์Šต๋‹ˆ๋‹ค.



์ด๋Ÿฌํ•œ ์ฝ”์Šค๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์Šต์„ ํ•œ ๋ฒˆ์— ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”. ๊ฐ์ž์˜ ์ทจํ–ฅ์— ๋งž์ถฐ ์ฝ”์Šค๋ฅผ ์กฐ์ ˆํ•˜์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!

```

### Python code with AutoModel

```python

import os

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM





model_id = "rtzr/ko-gemma-2-9b-it"



tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(

    model_id,

    torch_dtype=torch.bfloat16,

    device_map="auto",

)



model.eval()

instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"



messages = [

    {"role": "user", "content": f"{instruction}"}

]



input_ids = tokenizer.apply_chat_template(

    messages,

    add_generation_prompt=True,

    return_tensors="pt"

).to(model.device)



terminators = [

    tokenizer.eos_token_id,

    tokenizer.convert_tokens_to_ids("<end_of_turn>")

]



outputs = model.generate(

    input_ids,

    max_new_tokens=2048,

    eos_token_id=terminators,

    do_sample=True,

    temperature=0.6,

    top_p=0.9,

)



print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

```

```markdown

์„œ์šธ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด๋“œ๋ฆด๊ฒŒ์š”. ํ•˜๋ฃจ ์ข…์ผ ์ฆ๊ฒ๊ฒŒ ์—ฌํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฃจํŠธ๋กœ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.                                                    



### 1. ์„œ์šธ์—ญ์‚ฌ๊ด€ ๋ฐ ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„(์˜ค์ „)



- ์„œ์šธ์—ญ์‚ฌ๊ด€: ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ „์‹œ๋ฌผ๊ณผ ์ƒ์„ค์ „์‹œ๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

- ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„: ์„œ์šธ์˜ ํ•œ์˜ฅ์„ ๋ณด์กดํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์กฐ์„  ์‹œ๋Œ€์˜ ๋ถ„์œ„๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•œ์˜ฅ์—์„œ ๋ฌธํ™” ์ฝ˜ํ…์ธ ๋„ ์ œ๊ณตํ•˜๋Š” ๊ณณ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค.



### 2. ๋ถ์•…์‚ฐ ์ž…์žฅ๊ณผ ๋ถ์•…์‚ฐ ๋“ฑ์‚ฐ(์˜ค์ „)



- ๋ถ์•…์‚ฐ์€ ์„œ์šธ์˜ ๋ถ์ชฝ์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ ํ•œ๋ณตํŒ์—์„œ๋„ ์ž์—ฐ์„ ๋งŒ๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ถ์•…์‚ฐ ์ž…๊ตฌ์—์„œ ๋“ฑ์‚ฐ์„ ์‹œ์ž‘ํ•˜์—ฌ, ๋ถ์•…์‚ฐ ์ •์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



### 3. ์ข…๋กœ ๋ช…๋™ ์‡ผํ•‘๊ณผ ๋ง›์ง‘ ํˆฌ์–ด(๋‚ฎ)



- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์‡ผํ•‘๋ชฐ๊ณผ ๋งค์žฅ์ด ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ช…๋™ ์‡ผํ•‘ํƒ€์šด, ๋ฏธ์Šคํ„ฐํŠธ์œ„์Šคํ„ฐ, ๋ฏธ์Šคํ„ฐ๋ฆฌ๋งˆ์ผ“ ๋“ฑ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”.

- ๋ง›์ง‘ ํˆฌ์–ด: ๋ช…๋™์—๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๋–ก๋ณถ์ด, ์ˆœ๋Œ€, ๋‹ญ๊ฐ•์ • ๋“ฑ์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.



### 4. ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€๊ณผ ๋•์ˆ˜๊ถ(์˜คํ›„)



- ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€: ํ˜„๋Œ€๋ฏธ์ˆ ์„ ์ „์‹œํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠน๋ณ„์ „์ด ์—ด๋ฆฐ๋‹ค๋ฉด ๋ฐฉ๋ฌธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

- ๋•์ˆ˜๊ถ: ์กฐ์„ ์‹œ๋Œ€์˜ ๊ถ๊ถ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋ด„์—๋Š” ๋ฒš๊ฝƒ์ด ์•„๋ฆ„๋‹ต๊ฒŒ ๋งŒ๋ฐœํ•ฉ๋‹ˆ๋‹ค.



### 5. ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋‚จ์‚ฐ๊ณต์› ์‚ฐ์ฑ…(์˜คํ›„)



- ๋‚จ์‚ฐํƒ€์›Œ: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ด€๋žŒ๋Œ€์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐํƒ€์›Œ์— ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ 360๋„ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

- ๋‚จ์‚ฐ๊ณต์›: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ณต์›์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ…Œ๋งˆ ๊ณต์›๊ณผ ์กฐ๊ฒฝ์ด ์ž˜ ๋œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐ๊ณต์›์„ ์‚ฐ์ฑ…ํ•˜๋ฉฐ ํœด์‹์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



### 6. ๋ช…๋™ ๋˜๋Š” ์ดํƒœ์›์—์„œ์˜ ์ €๋… ์‹์‚ฌ์™€ ๋ฌธํ™” ํ™œ๋™(์ €๋…)



- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์ „ํ†ต์ ์ธ ํ•œ๊ตญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ช…๋™์€ ๋ฐค์—๋„ ํ™œ๊ธฐ์ฐจ๊ฒŒ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ์ƒํ™œ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค.

- ์ดํƒœ์›: ์™ธ๊ตญ์ธ ๊ด€๊ด‘๊ฐ๋“ค์ด ๋งŽ์ด ์ฐพ๋Š” ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์„ธ๊ณ„ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํด๋Ÿฝ์ด๋‚˜ ๋ฐ”๊ฐ€ ๋งŽ์€ ๋ฌธํ™”์  ํ™œ๋™์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.



์ด ์ฝ”์Šค๋Š” ํ•˜๋ฃจ ์ข…์ผ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฌํ–‰์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ„ํšํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์ด๋™ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์‹œ๊ณ , ๊ฐœ์žฅ ์‹œ๊ฐ„๊ณผ ์ „์‹œ ์ผ์ • ๋“ฑ์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•˜์‹œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!

```

### Quantized Versions through bitsandbytes

- *Using 8-bit precision*
- *Using 4-bit precision*

```python

# pip install bitsandbytes

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig





model_id = "rtzr/ko-gemma-2-9b-it"

quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)

# quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)



tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(

    model_id,

    torch_dtype=torch.bfloat16,

    device_map="auto",

    quantization_config=quantization_config_8bit,

    # quantization_config=quantization_config_4bit,

    low_cpu_mem_usage=True,

)



model.eval()

instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"



messages = [

    {"role": "user", "content": f"{instruction}"}

]



input_ids = tokenizer.apply_chat_template(

    messages,

    add_generation_prompt=True,

    return_tensors="pt"

).to(model.device)



terminators = [

    tokenizer.eos_token_id,

    tokenizer.convert_tokens_to_ids("<end_of_turn>")

]



outputs = model.generate(

    input_ids,

    max_new_tokens=2048,

    eos_token_id=terminators,

    do_sample=True,

    temperature=0.6,

    top_p=0.9,

)



print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

```

### VLLM Usage

When we use `vllm==0.5.1`, the gemma2 model cannot be loaded yet and the following [issue](https://github.com/vllm-project/vllm/issues/6237) occurs. So it is recommended to use `vllm/vllm-openai:latest` docker or [`vllm==0.5.0.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1).

```bash

#!/bin/bash



VLLM_ATTENTION_BACKEND=FLASHINFER

MODEL_NAME="rtzr/ko-gemma-2-9b-it"



MODEL_PATH="YOUR_PATH/${MODEL_NAME}"

docker run --rm --gpus all   \

    -p 8000:8000  \

    --shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \

    -e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \

    -v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \

    vllm/vllm-openai:latest \

        --model ${MODEL_NAME} --dtype auto \

        --gpu-memory-utilization 0.8

```

## License

Gemma 2 License: <https://ai.google.dev/gemma/terms>

## Citation

```none

@article{RTZR,

  title={ko-gemma-2-9b-it},

  author={Return Zero Team},

  year={2024},

  url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}

}

```

```none

@article{gemma_2024,

    title={Gemma},

    url={https://www.kaggle.com/m/3301},

    DOI={10.34740/KAGGLE/M/3301},

    publisher={Kaggle},

    author={Gemma Team},

    year={2024}

}

```