RichardErkhov commited on
Commit
2327fb4
โ€ข
1 Parent(s): 7b009fe

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +370 -0
README.md ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ ko-gemma-2-9b-it - GGUF
11
+ - Model creator: https://huggingface.co/rtzr/
12
+ - Original model: https://huggingface.co/rtzr/ko-gemma-2-9b-it/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [ko-gemma-2-9b-it.Q2_K.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q2_K.gguf) | Q2_K | 3.54GB |
18
+ | [ko-gemma-2-9b-it.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.IQ3_XS.gguf) | IQ3_XS | 3.86GB |
19
+ | [ko-gemma-2-9b-it.IQ3_S.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.IQ3_S.gguf) | IQ3_S | 4.04GB |
20
+ | [ko-gemma-2-9b-it.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q3_K_S.gguf) | Q3_K_S | 4.04GB |
21
+ | [ko-gemma-2-9b-it.IQ3_M.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.IQ3_M.gguf) | IQ3_M | 4.19GB |
22
+ | [ko-gemma-2-9b-it.Q3_K.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q3_K.gguf) | Q3_K | 4.43GB |
23
+ | [ko-gemma-2-9b-it.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q3_K_M.gguf) | Q3_K_M | 4.43GB |
24
+ | [ko-gemma-2-9b-it.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q3_K_L.gguf) | Q3_K_L | 4.78GB |
25
+ | [ko-gemma-2-9b-it.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.IQ4_XS.gguf) | IQ4_XS | 4.86GB |
26
+ | [ko-gemma-2-9b-it.Q4_0.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q4_0.gguf) | Q4_0 | 5.07GB |
27
+ | [ko-gemma-2-9b-it.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.IQ4_NL.gguf) | IQ4_NL | 5.1GB |
28
+ | [ko-gemma-2-9b-it.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q4_K_S.gguf) | Q4_K_S | 5.1GB |
29
+ | [ko-gemma-2-9b-it.Q4_K.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q4_K.gguf) | Q4_K | 5.37GB |
30
+ | [ko-gemma-2-9b-it.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q4_K_M.gguf) | Q4_K_M | 5.37GB |
31
+ | [ko-gemma-2-9b-it.Q4_1.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q4_1.gguf) | Q4_1 | 5.55GB |
32
+ | [ko-gemma-2-9b-it.Q5_0.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q5_0.gguf) | Q5_0 | 6.04GB |
33
+ | [ko-gemma-2-9b-it.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q5_K_S.gguf) | Q5_K_S | 6.04GB |
34
+ | [ko-gemma-2-9b-it.Q5_K.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q5_K.gguf) | Q5_K | 6.19GB |
35
+ | [ko-gemma-2-9b-it.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q5_K_M.gguf) | Q5_K_M | 6.19GB |
36
+ | [ko-gemma-2-9b-it.Q5_1.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q5_1.gguf) | Q5_1 | 6.52GB |
37
+ | [ko-gemma-2-9b-it.Q6_K.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q6_K.gguf) | Q6_K | 7.07GB |
38
+ | [ko-gemma-2-9b-it.Q8_0.gguf](https://huggingface.co/RichardErkhov/rtzr_-_ko-gemma-2-9b-it-gguf/blob/main/ko-gemma-2-9b-it.Q8_0.gguf) | Q8_0 | 9.15GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: gemma
46
+ library_name: transformers
47
+ pipeline_tag: text-generation
48
+ extra_gated_heading: Access Gemma on Hugging Face
49
+ extra_gated_prompt: >-
50
+ To access Gemma on Hugging Face, youโ€™re required to review and agree to
51
+ Googleโ€™s usage license. To do this, please ensure youโ€™re logged in to Hugging
52
+ Face and click below. Requests are processed immediately.
53
+ extra_gated_button_content: Acknowledge license
54
+ tags:
55
+ - conversational
56
+ base_model:
57
+ - google/gemma-2-9b
58
+ language:
59
+ - ko
60
+ ---
61
+
62
+
63
+
64
+ ## Model Details
65
+
66
+ ### Ko-Gemma-2-9B-IT
67
+
68
+ **[Ko-Gemma-2-9B-IT](https://huggingface.co/rtzr/ko-gemma-2-9b-it)** is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) training specifically for Human Feedback. The datasets include:
69
+
70
+ - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-korean-dpo-pairs)
71
+ - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
72
+
73
+ Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.
74
+
75
+ #### *Inputs and outputs*
76
+
77
+ - **Input:** Text string, such as a question, a prompt, or a document to be summarized.
78
+ - **Output:** Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.
79
+
80
+ ### Google Gemma 2
81
+
82
+ Gemma is a family of lightweight, state-of-the-art open models from Google,
83
+ built from the same research and technology used to create the Gemini models.
84
+ They are text-to-text, decoder-only large language models, available in English,
85
+ with open weights for both pre-trained variants and instruction-tuned variants.
86
+ Gemma models are well-suited for a variety of text generation tasks, including
87
+ question answering, summarization, and reasoning. Their relatively small size
88
+ makes it possible to deploy them in environments with limited resources such as
89
+ a laptop, desktop or your own cloud infrastructure, democratizing access to
90
+ state of the art AI models and helping foster innovation for everyone.
91
+
92
+ ## Benchmark Scores
93
+
94
+ We evaluated it internally using [LogicKor](https://github.com/instructkr/LogicKor) code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.
95
+
96
+ | Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
97
+ |:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
98
+ | [rtzr/ko-gemma-2-9b-it](https://huggingface.co/rtzr/ko-gemma-2-9b-it) | 8.71 / 8.00 | 9.14 / 8.00 | 9.43 / 9.29 | 9.00 / 9.43 | 9.57 / 9.86 | 7.14 / 5.00 | 8.83 | 8.26 | 8.55 |
99
+ | [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 8.57 / 7.71 | 8.86 / 7.00 | 9.29 / 9.29 | 9.29 / 9.57 | 8.57 / 8.29 | 6.86 / 3.86 | 8.57 | 7.62 | 8.10 |
100
+ | [MLP-KTLim/llama-3-Korean-Bllossom-8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B) | 6.43 / 5.71 | 6.86 / 5.14 | 9.14 / 8.57 | 8.29 / 8.14 | 8.43 / 9.29 | 5.71 / 5.29 | 7.48 | 7.02 | 7.25 |
101
+ | [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) | 5.57 / 4.29 | 8.14 / 5.14 | 8.29 / 6.29 | 6.43 / 7.86 | 9.29 / 8.57 | 6.57 / 3.71 | 7.38 | 5.98 | 6.68 |
102
+ | [allganize/Llama-3-Alpha-Ko-8B-Instruct](https://huggingface.co/allganize/Llama-3-Alpha-Ko-8B-Instruct) | 4.57 / 3.00 | 6.86 / 6.43 | 7.43 / 6.71 | 8.43 / 8.43| 7.71 / 8.71 | 6.71 / 4.43 | 6.95 | 6.29 | 6.62 |
103
+
104
+ ## Usage
105
+
106
+ ### Install Dependencies
107
+
108
+ You must install transformers >= 4.42.3 for gemma2 models.
109
+
110
+ ```bash
111
+ pip install transformers==4.42.3 accelerate
112
+ ```
113
+
114
+ ### Python code with Pipeline
115
+
116
+ ```python
117
+ import transformers
118
+ import torch
119
+
120
+
121
+ model_id = "rtzr/ko-gemma-2-9b-it"
122
+
123
+ pipeline = transformers.pipeline(
124
+ "text-generation",
125
+ model=model_id,
126
+ model_kwargs={"torch_dtype": torch.bfloat16},
127
+ device_map="auto",
128
+ )
129
+
130
+ pipeline.model.eval()
131
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
132
+
133
+ messages = [
134
+ {"role": "user", "content": f"{instruction}"}
135
+ ]
136
+
137
+ prompt = pipeline.tokenizer.apply_chat_template(
138
+ messages,
139
+ tokenize=False,
140
+ add_generation_prompt=True
141
+ )
142
+
143
+ terminators = [
144
+ pipeline.tokenizer.eos_token_id,
145
+ pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
146
+ ]
147
+
148
+ outputs = pipeline(
149
+ prompt,
150
+ max_new_tokens=2048,
151
+ eos_token_id=terminators,
152
+ do_sample=True,
153
+ temperature=0.6,
154
+ top_p=0.9,
155
+ )
156
+
157
+ print(outputs[0]["generated_text"][len(prompt):])
158
+ ```
159
+
160
+ ```markdown
161
+ ์„œ์šธ์€ ์—ญ์‚ฌ, ๋ฌธํ™”, ํ˜„๋Œ€์„ฑ์ด ์กฐํ™”๋ฅผ ์ด๋ฃฌ ๋งค๋ ฅ์ ์ธ ๋„์‹œ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ด€๊ด‘์ง€์™€ ๋ช…์†Œ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค 3๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.
162
+
163
+ **1. ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ํ•œ๊ตญ๊ด€๊ด‘์ฝ”์Šค**
164
+
165
+ 1. **๊ฒฝ๋ณต๊ถ**: ์กฐ์„  ์‹œ๋Œ€์˜ ์›…์žฅํ•œ ์™•๊ถ์„ ๋งŒ๋ฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งค๋…„ ๋ด„์— ์—ด๋ฆฌ๋Š” '์ถ˜์ถ”์—ฐํšŒ'๋Š” ๊ฒฝ๋ณต๊ถ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋”์šฑ ๋‹๋ณด์ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
166
+ 2. **๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„**: ๊ณ ํ’์Šค๋Ÿฌ์šด ํ•œ์˜ฅ์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ณณ์œผ๋กœ, ์ „ํ†ต ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. '๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„ ๋ฌธํ™”์ฒดํ—˜๊ด€'์—์„œ๋Š” ํ•œ๋ณต ์ฒดํ—˜๋ถ€ํ„ฐ ์ข…์ด๋งŒํ™”, ํ•œ๊ธ€ ์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
167
+ 3. **์ธ์‚ฌ๋™**: ์„œ์ , ๋ฏธ์ˆ ๊ด€, ํ•œ์‹๋‹น์ด ๋งŽ์€ ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ '์ธ์‚ฌ๋™ ๋ฌธํ™”๊ด€'์—์„œ๋Š” ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์ „์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
168
+ 4. **๊ด‘ํ™”๋ฌธ** ๋ฐ **๋ช…๋™**: ํ˜„๋Œ€์ ์ธ ์‡ผํ•‘๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์ฆ๋น„ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๊ด‘ํ™”๋ฌธ์€ ํŠนํžˆ ์ Š์€์ด๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ์ŠคํŠธ๋ฆฌํŠธ ํŒจ์…˜์„ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ๋ฐค๊ฑฐ๋ฆฌ์—์„œ ํ™œ๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
169
+
170
+ **2. ๏ฟฝ๏ฟฝ์‹œ์˜ ๋ชจ์Šต์„ ๋ฐ”๋ผ๋ณด๋Š” ๋ทฐํˆฌ์–ด ์ฝ”์Šค**
171
+
172
+ 1. **๋‚จ์‚ฐํƒ€์›Œ**: ์„œ์šธ์˜ ์ƒ์ง•์ ์ธ ๊ฑด๋ฌผ๋กœ, ๊ผญ๋Œ€๊ธฐ์—์„œ ํŽผ์ณ์ง€๋Š” 360๋„์˜ ๊ฒฝ์น˜๊ฐ€ ์••๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐค์ด ๋˜๋ฉด ์กฐ๋ช…์ด ์–ด์šฐ๋Ÿฌ์ ธ ๋”์šฑ ์•„๋ฆ„๋‹ค์›Œ์ง‘๋‹ˆ๋‹ค.
173
+ 2. **์„œ์šธํƒ€์›Œ**: ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋น„์Šทํ•œ ์œ„์น˜๋กœ, ๋†’์ด๊ฐ€ ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋„“์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธํƒ€์›Œ ๋‚ด๋ถ€์—๋Š” ๋‹ค์–‘ํ•œ ์ „์‹œ๊ด€๊ณผ ๋ ˆ์Šคํ† ๋ž‘๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
174
+ 3. **๋ถ์•…์‚ฐ**: ์„œ์šธ์˜ ์ค‘์‹ฌ๋ถ€์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ์˜ ๊ฒฝ์น˜๋ฅผ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ถ์•…์‚ฐ ์ •์ƒ์ธ ๋ถ์•…์‚ฌ์—์„œ๋„ ์ข‹์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
175
+ 4. **์„œ์šธ์ˆฒ**: ๋…น์ง€ ๊ณต๊ฐ„์œผ๋กœ, ๋„์‹œ์˜ ํ˜ผ์žกํ•จ์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์„œ์šธ์ˆฒ ๋‚ด๋ถ€์—๋Š” '์„œ์šธ์ˆฒ ์•„ํŠธํ”„๋ ˆ์  ํŠธ'๋ผ๋Š” ๊ณต๊ฐ„์ด ์žˆ์–ด ์˜ˆ์ˆ ๊ณผ ์ž์—ฐ์„ ํ•จ๊ป˜ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
176
+
177
+ **3. ํ˜„๋Œ€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚˜๋Š” ์ฝ”์Šค**
178
+
179
+ 1. **์‚ผ์„ฑ๋™**: ํ˜„๋Œ€ ๋ฏธ์ˆ ๊ด€์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, '์‚ผ์„ฑ ๋ฏธ์ˆ ๊ด€', '์•„๋ชจ๋ฆฌ์นด๋‚˜์Šค ๊ฐค๋Ÿฌ๋ฆฌ' ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ฝ”์—‘์Šค'๋‚˜ '์•„ํฌ์นด๋กœํฌ์Šค' ๋“ฑ์˜ ๋ช…์†Œ๋„ ๊ฐ€๊นŒ์šด ๊ณณ์— ์žˆ์Šต๋‹ˆ๋‹ค.
180
+ 2. **์ดํƒœ์›**: ์™ธ๊ตญ์ธ๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์™ธ๊ตญ ์Œ์‹์„ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ดํƒœ์› ๊ธ€๋กœ์ปฌ๋ฌธํ™”์„ผํ„ฐ'์—์„œ๋Š” ์„ธ๊ณ„ ๊ฐ๊ตญ์˜ ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
181
+ 3. **ํ™๋Œ€**: ์ Š์€์ด๋“ค์˜ ๋ฌธํ™”๊ฐ€ ๋„˜์น˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. 'ํ™๋Œ€ ๋กค๋งํ™€'์€ ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐฉ๋ฌธํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, 'ํ™๋Œ€ ์„œ์ ๊ฑฐ๋ฆฌ'์—์„œ๋Š” ๋…์„œ์™€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
182
+ 4. **๊ฐ•๋‚จ**: ์„œ์šธ์˜ ํ˜„๋Œ€์  ๋ชจ์Šต์„ ์ž˜ ๋ณด์—ฌ์ฃผ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. '๊ฐ•๋‚จ์—ญ'์„ ์ค‘์‹ฌ์œผ๋กœ ๋งŽ์€ ๊ณ ๊ธ‰ ์‡ผํ•‘๋ชฐ๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์žˆ์Šต๋‹ˆ๋‹ค.
183
+
184
+ ์ด๋Ÿฌํ•œ ์ฝ”์Šค๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์Šต์„ ํ•œ ๋ฒˆ์— ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”. ๊ฐ์ž์˜ ์ทจํ–ฅ์— ๋งž์ถฐ ์ฝ”์Šค๋ฅผ ์กฐ์ ˆํ•˜์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
185
+ ```
186
+
187
+ ### Python code with AutoModel
188
+
189
+ ```python
190
+ import os
191
+ import torch
192
+ from transformers import AutoTokenizer, AutoModelForCausalLM
193
+
194
+
195
+ model_id = "rtzr/ko-gemma-2-9b-it"
196
+
197
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
198
+ model = AutoModelForCausalLM.from_pretrained(
199
+ model_id,
200
+ torch_dtype=torch.bfloat16,
201
+ device_map="auto",
202
+ )
203
+
204
+ model.eval()
205
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
206
+
207
+ messages = [
208
+ {"role": "user", "content": f"{instruction}"}
209
+ ]
210
+
211
+ input_ids = tokenizer.apply_chat_template(
212
+ messages,
213
+ add_generation_prompt=True,
214
+ return_tensors="pt"
215
+ ).to(model.device)
216
+
217
+ terminators = [
218
+ tokenizer.eos_token_id,
219
+ tokenizer.convert_tokens_to_ids("<end_of_turn>")
220
+ ]
221
+
222
+ outputs = model.generate(
223
+ input_ids,
224
+ max_new_tokens=2048,
225
+ eos_token_id=terminators,
226
+ do_sample=True,
227
+ temperature=0.6,
228
+ top_p=0.9,
229
+ )
230
+
231
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
232
+ ```
233
+
234
+ ```markdown
235
+ ์„œ์šธ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด๋“œ๋ฆด๊ฒŒ์š”. ํ•˜๋ฃจ ์ข…์ผ ์ฆ๊ฒ๊ฒŒ ์—ฌํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฃจํŠธ๋กœ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
236
+
237
+ ### 1. ์„œ์šธ์—ญ์‚ฌ๊ด€ ๋ฐ ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„(์˜ค์ „)
238
+
239
+ - ์„œ์šธ์—ญ์‚ฌ๊ด€: ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ „์‹œ๋ฌผ๊ณผ ์ƒ์„ค์ „์‹œ๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
240
+ - ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„: ์„œ์šธ์˜ ํ•œ์˜ฅ์„ ๋ณด์กดํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์กฐ์„  ์‹œ๋Œ€์˜ ๋ถ„์œ„๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•œ์˜ฅ์—์„œ ๋ฌธํ™” ์ฝ˜ํ…์ธ ๋„ ์ œ๊ณตํ•˜๋Š” ๊ณณ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค.
241
+
242
+ ### 2. ๋ถ์•…์‚ฐ ์ž…์žฅ๊ณผ ๋ถ์•…์‚ฐ ๋“ฑ์‚ฐ(์˜ค์ „)
243
+
244
+ - ๋ถ์•…์‚ฐ์€ ์„œ์šธ์˜ ๋ถ์ชฝ์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ ํ•œ๋ณตํŒ์—์„œ๋„ ์ž์—ฐ์„ ๋งŒ๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ถ์•…์‚ฐ ์ž…๊ตฌ์—์„œ ๋“ฑ์‚ฐ์„ ์‹œ์ž‘ํ•˜์—ฌ, ๋ถ์•…์‚ฐ ์ •์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
245
+
246
+ ### 3. ์ข…๋กœ ๋ช…๋™ ์‡ผํ•‘๊ณผ ๋ง›์ง‘ ํˆฌ์–ด(๋‚ฎ)
247
+
248
+ - ๋ช…๋™: ๋‹ค์–‘ํ•œ ์‡ผํ•‘๋ชฐ๊ณผ ๋งค์žฅ์ด ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ช…๋™ ์‡ผํ•‘ํƒ€์šด, ๋ฏธ์Šคํ„ฐํŠธ์œ„์Šคํ„ฐ, ๋ฏธ์Šคํ„ฐ๋ฆฌ๋งˆ์ผ“ ๋“ฑ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”.
249
+ - ๋ง›์ง‘ ํˆฌ์–ด: ๋ช…๋™์—๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๋–ก๋ณถ์ด, ์ˆœ๋Œ€, ๋‹ญ๊ฐ•์ • ๋“ฑ์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
250
+
251
+ ### 4. ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€๊ณผ ๋•์ˆ˜๊ถ(์˜คํ›„)
252
+
253
+ - ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€: ํ˜„๋Œ€๋ฏธ์ˆ ์„ ์ „์‹œํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠน๋ณ„์ „์ด ์—ด๋ฆฐ๋‹ค๋ฉด ๋ฐฉ๋ฌธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
254
+ - ๋•์ˆ˜๊ถ: ์กฐ์„ ์‹œ๋Œ€์˜ ๊ถ๊ถ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋ด„์—๋Š” ๋ฒš๊ฝƒ์ด ์•„๋ฆ„๋‹ต๊ฒŒ ๋งŒ๋ฐœํ•ฉ๋‹ˆ๋‹ค.
255
+
256
+ ### 5. ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋‚จ์‚ฐ๊ณต์› ์‚ฐ์ฑ…(์˜คํ›„)
257
+
258
+ - ๋‚จ์‚ฐํƒ€์›Œ: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ด€๋žŒ๋Œ€์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐํƒ€์›Œ์— ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ 360๋„ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
259
+ - ๋‚จ์‚ฐ๊ณต์›: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ณต์›์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ…Œ๋งˆ ๊ณต์›๊ณผ ์กฐ๊ฒฝ์ด ์ž˜ ๋œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐ๊ณต์›์„ ์‚ฐ์ฑ…ํ•˜๋ฉฐ ํœด์‹์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
260
+
261
+ ### 6. ๋ช…๋™ ๋˜๋Š” ์ดํƒœ์›์—์„œ์˜ ์ €๋… ์‹์‚ฌ์™€ ๏ฟฝ๏ฟฝ๏ฟฝํ™” ํ™œ๋™(์ €๋…)
262
+
263
+ - ๋ช…๋™: ๋‹ค์–‘ํ•œ ์ „ํ†ต์ ์ธ ํ•œ๊ตญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ช…๋™์€ ๋ฐค์—๋„ ํ™œ๊ธฐ์ฐจ๊ฒŒ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ์ƒํ™œ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค.
264
+ - ์ดํƒœ์›: ์™ธ๊ตญ์ธ ๊ด€๊ด‘๊ฐ๋“ค์ด ๋งŽ์ด ์ฐพ๋Š” ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์„ธ๊ณ„ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํด๋Ÿฝ์ด๋‚˜ ๋ฐ”๊ฐ€ ๋งŽ์€ ๋ฌธํ™”์  ํ™œ๋™์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.
265
+
266
+ ์ด ์ฝ”์Šค๋Š” ํ•˜๋ฃจ ์ข…์ผ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฌํ–‰์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ„ํšํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์ด๋™ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์‹œ๊ณ , ๊ฐœ์žฅ ์‹œ๊ฐ„๊ณผ ์ „์‹œ ์ผ์ • ๋“ฑ์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•˜์‹œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
267
+ ```
268
+
269
+ ### Quantized Versions through bitsandbytes
270
+
271
+ - *Using 8-bit precision*
272
+ - *Using 4-bit precision*
273
+
274
+ ```python
275
+ # pip install bitsandbytes
276
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
277
+
278
+
279
+ model_id = "rtzr/ko-gemma-2-9b-it"
280
+ quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
281
+ # quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)
282
+
283
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
284
+ model = AutoModelForCausalLM.from_pretrained(
285
+ model_id,
286
+ torch_dtype=torch.bfloat16,
287
+ device_map="auto",
288
+ quantization_config=quantization_config_8bit,
289
+ # quantization_config=quantization_config_4bit,
290
+ low_cpu_mem_usage=True,
291
+ )
292
+
293
+ model.eval()
294
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
295
+
296
+ messages = [
297
+ {"role": "user", "content": f"{instruction}"}
298
+ ]
299
+
300
+ input_ids = tokenizer.apply_chat_template(
301
+ messages,
302
+ add_generation_prompt=True,
303
+ return_tensors="pt"
304
+ ).to(model.device)
305
+
306
+ terminators = [
307
+ tokenizer.eos_token_id,
308
+ tokenizer.convert_tokens_to_ids("<end_of_turn>")
309
+ ]
310
+
311
+ outputs = model.generate(
312
+ input_ids,
313
+ max_new_tokens=2048,
314
+ eos_token_id=terminators,
315
+ do_sample=True,
316
+ temperature=0.6,
317
+ top_p=0.9,
318
+ )
319
+
320
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
321
+ ```
322
+
323
+ ### VLLM Usage
324
+
325
+ When we use `vllm==0.5.1`, the gemma2 model cannot be loaded yet and the following [issue](https://github.com/vllm-project/vllm/issues/6237) occurs. So it is recommended to use `vllm/vllm-openai:latest` docker or [`vllm==0.5.0.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1).
326
+
327
+ ```bash
328
+ #!/bin/bash
329
+
330
+ VLLM_ATTENTION_BACKEND=FLASHINFER
331
+ MODEL_NAME="rtzr/ko-gemma-2-9b-it"
332
+
333
+ MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
334
+ docker run --rm --gpus all \
335
+ -p 8000:8000 \
336
+ --shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
337
+ -e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
338
+ -v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
339
+ vllm/vllm-openai:latest \
340
+ --model ${MODEL_NAME} --dtype auto \
341
+ --gpu-memory-utilization 0.8
342
+ ```
343
+
344
+ ## License
345
+
346
+ Gemma 2 License: <https://ai.google.dev/gemma/terms>
347
+
348
+ ## Citation
349
+
350
+ ```none
351
+ @article{RTZR,
352
+ title={ko-gemma-2-9b-it},
353
+ author={Return Zero Team},
354
+ year={2024},
355
+ url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
356
+ }
357
+ ```
358
+
359
+ ```none
360
+ @article{gemma_2024,
361
+ title={Gemma},
362
+ url={https://www.kaggle.com/m/3301},
363
+ DOI={10.34740/KAGGLE/M/3301},
364
+ publisher={Kaggle},
365
+ author={Gemma Team},
366
+ year={2024}
367
+ }
368
+ ```
369
+
370
+