RichardErkhov commited on
Commit
855e324
โ€ข
1 Parent(s): 3ae3443

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +344 -0
README.md ADDED
@@ -0,0 +1,344 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ ko-gemma-2-9b-it - bnb 4bits
11
+ - Model creator: https://huggingface.co/rtzr/
12
+ - Original model: https://huggingface.co/rtzr/ko-gemma-2-9b-it/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: gemma
20
+ library_name: transformers
21
+ pipeline_tag: text-generation
22
+ extra_gated_heading: Access Gemma on Hugging Face
23
+ extra_gated_prompt: >-
24
+ To access Gemma on Hugging Face, youโ€™re required to review and agree to
25
+ Googleโ€™s usage license. To do this, please ensure youโ€™re logged in to Hugging
26
+ Face and click below. Requests are processed immediately.
27
+ extra_gated_button_content: Acknowledge license
28
+ tags:
29
+ - conversational
30
+ base_model:
31
+ - google/gemma-2-9b
32
+ language:
33
+ - ko
34
+ ---
35
+
36
+
37
+
38
+ ## Model Details
39
+
40
+ ### Ko-Gemma-2-9B-IT
41
+
42
+ **[Ko-Gemma-2-9B-IT](https://huggingface.co/rtzr/ko-gemma-2-9b-it)** is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) training specifically for Human Feedback. The datasets include:
43
+
44
+ - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-korean-dpo-pairs)
45
+ - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
46
+
47
+ Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.
48
+
49
+ #### *Inputs and outputs*
50
+
51
+ - **Input:** Text string, such as a question, a prompt, or a document to be summarized.
52
+ - **Output:** Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.
53
+
54
+ ### Google Gemma 2
55
+
56
+ Gemma is a family of lightweight, state-of-the-art open models from Google,
57
+ built from the same research and technology used to create the Gemini models.
58
+ They are text-to-text, decoder-only large language models, available in English,
59
+ with open weights for both pre-trained variants and instruction-tuned variants.
60
+ Gemma models are well-suited for a variety of text generation tasks, including
61
+ question answering, summarization, and reasoning. Their relatively small size
62
+ makes it possible to deploy them in environments with limited resources such as
63
+ a laptop, desktop or your own cloud infrastructure, democratizing access to
64
+ state of the art AI models and helping foster innovation for everyone.
65
+
66
+ ## Benchmark Scores
67
+
68
+ We evaluated it internally using [LogicKor](https://github.com/instructkr/LogicKor) code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.
69
+
70
+ | Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
71
+ |:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
72
+ | [rtzr/ko-gemma-2-9b-it](https://huggingface.co/rtzr/ko-gemma-2-9b-it) | 8.71 / 8.00 | 9.14 / 8.00 | 9.43 / 9.29 | 9.00 / 9.43 | 9.57 / 9.86 | 7.14 / 5.00 | 8.83 | 8.26 | 8.55 |
73
+ | [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 8.57 / 7.71 | 8.86 / 7.00 | 9.29 / 9.29 | 9.29 / 9.57 | 8.57 / 8.29 | 6.86 / 3.86 | 8.57 | 7.62 | 8.10 |
74
+ | [MLP-KTLim/llama-3-Korean-Bllossom-8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B) | 6.43 / 5.71 | 6.86 / 5.14 | 9.14 / 8.57 | 8.29 / 8.14 | 8.43 / 9.29 | 5.71 / 5.29 | 7.48 | 7.02 | 7.25 |
75
+ | [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) | 5.57 / 4.29 | 8.14 / 5.14 | 8.29 / 6.29 | 6.43 / 7.86 | 9.29 / 8.57 | 6.57 / 3.71 | 7.38 | 5.98 | 6.68 |
76
+ | [allganize/Llama-3-Alpha-Ko-8B-Instruct](https://huggingface.co/allganize/Llama-3-Alpha-Ko-8B-Instruct) | 4.57 / 3.00 | 6.86 / 6.43 | 7.43 / 6.71 | 8.43 / 8.43| 7.71 / 8.71 | 6.71 / 4.43 | 6.95 | 6.29 | 6.62 |
77
+
78
+ ## Usage
79
+
80
+ ### Install Dependencies
81
+
82
+ You must install transformers >= 4.42.3 for gemma2 models.
83
+
84
+ ```bash
85
+ pip install transformers==4.42.3 accelerate
86
+ ```
87
+
88
+ ### Python code with Pipeline
89
+
90
+ ```python
91
+ import transformers
92
+ import torch
93
+
94
+
95
+ model_id = "rtzr/ko-gemma-2-9b-it"
96
+
97
+ pipeline = transformers.pipeline(
98
+ "text-generation",
99
+ model=model_id,
100
+ model_kwargs={"torch_dtype": torch.bfloat16},
101
+ device_map="auto",
102
+ )
103
+
104
+ pipeline.model.eval()
105
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
106
+
107
+ messages = [
108
+ {"role": "user", "content": f"{instruction}"}
109
+ ]
110
+
111
+ prompt = pipeline.tokenizer.apply_chat_template(
112
+ messages,
113
+ tokenize=False,
114
+ add_generation_prompt=True
115
+ )
116
+
117
+ terminators = [
118
+ pipeline.tokenizer.eos_token_id,
119
+ pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
120
+ ]
121
+
122
+ outputs = pipeline(
123
+ prompt,
124
+ max_new_tokens=2048,
125
+ eos_token_id=terminators,
126
+ do_sample=True,
127
+ temperature=0.6,
128
+ top_p=0.9,
129
+ )
130
+
131
+ print(outputs[0]["generated_text"][len(prompt):])
132
+ ```
133
+
134
+ ```markdown
135
+ ์„œ์šธ์€ ์—ญ์‚ฌ, ๋ฌธํ™”, ํ˜„๋Œ€์„ฑ์ด ์กฐํ™”๋ฅผ ์ด๋ฃฌ ๋งค๋ ฅ์ ์ธ ๋„์‹œ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ด€๊ด‘์ง€์™€ ๋ช…์†Œ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค 3๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.
136
+
137
+ **1. ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ํ•œ๊ตญ๊ด€๊ด‘์ฝ”์Šค**
138
+
139
+ 1. **๊ฒฝ๋ณต๊ถ**: ์กฐ์„  ์‹œ๋Œ€์˜ ์›…์žฅํ•œ ์™•๊ถ์„ ๋งŒ๋ฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งค๋…„ ๋ด„์— ์—ด๋ฆฌ๋Š” '์ถ˜์ถ”์—ฐํšŒ'๋Š” ๊ฒฝ๋ณต๊ถ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋”์šฑ ๋‹๋ณด์ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
140
+ 2. **๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„**: ๊ณ ํ’์Šค๋Ÿฌ์šด ํ•œ์˜ฅ์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ณณ์œผ๋กœ, ์ „ํ†ต ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. '๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„ ๋ฌธํ™”์ฒดํ—˜๊ด€'์—์„œ๋Š” ํ•œ๋ณต ์ฒดํ—˜๋ถ€ํ„ฐ ์ข…์ด๋งŒํ™”, ํ•œ๊ธ€ ์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
141
+ 3. **์ธ์‚ฌ๋™**: ์„œ์ , ๋ฏธ์ˆ ๊ด€, ํ•œ์‹๋‹น์ด ๋งŽ์€ ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ '์ธ์‚ฌ๋™ ๋ฌธํ™”๊ด€'์—์„œ๋Š” ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์ „์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
142
+ 4. **๊ด‘ํ™”๋ฌธ** ๋ฐ **๋ช…๋™**: ํ˜„๋Œ€์ ์ธ ์‡ผํ•‘๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์ฆ๋น„ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๊ด‘ํ™”๋ฌธ์€ ํŠนํžˆ ์ Š์€์ด๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ์ŠคํŠธ๋ฆฌํŠธ ํŒจ์…˜์„ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ๋ฐค๊ฑฐ๋ฆฌ์—์„œ ํ™œ๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
143
+
144
+ **2. ๋„์‹œ์˜ ๋ชจ์Šต์„ ๋ฐ”๋ผ๋ณด๋Š” ๋ทฐํˆฌ์–ด ์ฝ”์Šค**
145
+
146
+ 1. **๋‚จ์‚ฐํƒ€์›Œ**: ์„œ์šธ์˜ ์ƒ์ง•์ ์ธ ๊ฑด๋ฌผ๋กœ, ๊ผญ๋Œ€๊ธฐ์—์„œ ํŽผ์ณ์ง€๋Š” 360๋„์˜ ๊ฒฝ์น˜๊ฐ€ ์••๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐค์ด ๋˜๋ฉด ์กฐ๋ช…์ด ์–ด์šฐ๋Ÿฌ์ ธ ๋”์šฑ ์•„๋ฆ„๋‹ค์›Œ์ง‘๋‹ˆ๋‹ค.
147
+ 2. **์„œ์šธํƒ€์›Œ**: ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋น„์Šทํ•œ ์œ„์น˜๋กœ, ๋†’์ด๊ฐ€ ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋„“์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธํƒ€์›Œ ๋‚ด๋ถ€์—๋Š” ๋‹ค์–‘ํ•œ ์ „์‹œ๊ด€๊ณผ ๋ ˆ์Šคํ† ๋ž‘๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
148
+ 3. **๋ถ์•…์‚ฐ**: ์„œ์šธ์˜ ์ค‘์‹ฌ๋ถ€์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ์˜ ๊ฒฝ์น˜๋ฅผ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ถ์•…์‚ฐ ์ •์ƒ์ธ ๋ถ์•…์‚ฌ์—์„œ๋„ ์ข‹์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
149
+ 4. **์„œ์šธ์ˆฒ**: ๋…น์ง€ ๊ณต๊ฐ„์œผ๋กœ, ๋„์‹œ์˜ ํ˜ผ์žกํ•จ์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์„œ์šธ์ˆฒ ๋‚ด๋ถ€์—๋Š” '์„œ์šธ์ˆฒ ์•„ํŠธํ”„๋ ˆ์  ํŠธ'๋ผ๋Š” ๊ณต๊ฐ„์ด ์žˆ์–ด ์˜ˆ์ˆ ๊ณผ ์ž์—ฐ์„ ํ•จ๊ป˜ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
150
+
151
+ **3. ํ˜„๋Œ€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚˜๋Š” ์ฝ”์Šค**
152
+
153
+ 1. **์‚ผ์„ฑ๋™**: ํ˜„๋Œ€ ๋ฏธ์ˆ ๊ด€์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, '์‚ผ์„ฑ ๋ฏธ์ˆ ๊ด€', '์•„๋ชจ๋ฆฌ์นด๋‚˜์Šค ๊ฐค๋Ÿฌ๋ฆฌ' ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ฝ”์—‘์Šค'๋‚˜ '์•„ํฌ์นด๋กœํฌ์Šค' ๋“ฑ์˜ ๋ช…์†Œ๋„ ๊ฐ€๊นŒ์šด ๊ณณ์— ์žˆ์Šต๋‹ˆ๋‹ค.
154
+ 2. **์ดํƒœ์›**: ์™ธ๊ตญ์ธ๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์™ธ๊ตญ ์Œ์‹์„ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ดํƒœ์› ๊ธ€๋กœ์ปฌ๋ฌธํ™”์„ผํ„ฐ'์—์„œ๋Š” ์„ธ๊ณ„ ๊ฐ๊ตญ์˜ ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
155
+ 3. **ํ™๋Œ€**: ์ Š์€์ด๋“ค์˜ ๋ฌธํ™”๊ฐ€ ๋„˜์น˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. 'ํ™๋Œ€ ๋กค๋งํ™€'์€ ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐฉ๋ฌธํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, 'ํ™๋Œ€ ์„œ์ ๊ฑฐ๋ฆฌ'์—์„œ๋Š” ๋…์„œ์™€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
156
+ 4. **๊ฐ•๋‚จ**: ์„œ์šธ์˜ ํ˜„๋Œ€์  ๋ชจ์Šต์„ ์ž˜ ๋ณด์—ฌ์ฃผ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. '๊ฐ•๋‚จ์—ญ'์„ ์ค‘์‹ฌ์œผ๋กœ ๋งŽ์€ ๊ณ ๊ธ‰ ์‡ผํ•‘๋ชฐ๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์žˆ์Šต๋‹ˆ๋‹ค.
157
+
158
+ ์ด๋Ÿฌํ•œ ์ฝ”์Šค๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์Šต์„ ํ•œ ๋ฒˆ์— ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”. ๊ฐ์ž์˜ ์ทจํ–ฅ์— ๋งž์ถฐ ์ฝ”์Šค๋ฅผ ์กฐ์ ˆํ•˜์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
159
+ ```
160
+
161
+ ### Python code with AutoModel
162
+
163
+ ```python
164
+ import os
165
+ import torch
166
+ from transformers import AutoTokenizer, AutoModelForCausalLM
167
+
168
+
169
+ model_id = "rtzr/ko-gemma-2-9b-it"
170
+
171
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
172
+ model = AutoModelForCausalLM.from_pretrained(
173
+ model_id,
174
+ torch_dtype=torch.bfloat16,
175
+ device_map="auto",
176
+ )
177
+
178
+ model.eval()
179
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
180
+
181
+ messages = [
182
+ {"role": "user", "content": f"{instruction}"}
183
+ ]
184
+
185
+ input_ids = tokenizer.apply_chat_template(
186
+ messages,
187
+ add_generation_prompt=True,
188
+ return_tensors="pt"
189
+ ).to(model.device)
190
+
191
+ terminators = [
192
+ tokenizer.eos_token_id,
193
+ tokenizer.convert_tokens_to_ids("<end_of_turn>")
194
+ ]
195
+
196
+ outputs = model.generate(
197
+ input_ids,
198
+ max_new_tokens=2048,
199
+ eos_token_id=terminators,
200
+ do_sample=True,
201
+ temperature=0.6,
202
+ top_p=0.9,
203
+ )
204
+
205
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
206
+ ```
207
+
208
+ ```markdown
209
+ ์„œ์šธ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด๋“œ๋ฆด๊ฒŒ์š”. ํ•˜๋ฃจ ์ข…์ผ ์ฆ๊ฒ๊ฒŒ ์—ฌํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฃจํŠธ๋กœ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
210
+
211
+ ### 1. ์„œ์šธ์—ญ์‚ฌ๊ด€ ๋ฐ ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„(์˜ค์ „)
212
+
213
+ - ์„œ์šธ์—ญ์‚ฌ๊ด€: ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ „์‹œ๋ฌผ๊ณผ ์ƒ์„ค์ „์‹œ๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
214
+ - ๋ถ์ดŒํ•œ๏ฟฝ๏ฟฝ๋งˆ์„: ์„œ์šธ์˜ ํ•œ์˜ฅ์„ ๋ณด์กดํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์กฐ์„  ์‹œ๋Œ€์˜ ๋ถ„์œ„๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•œ์˜ฅ์—์„œ ๋ฌธํ™” ์ฝ˜ํ…์ธ ๋„ ์ œ๊ณตํ•˜๋Š” ๊ณณ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค.
215
+
216
+ ### 2. ๋ถ์•…์‚ฐ ์ž…์žฅ๊ณผ ๋ถ์•…์‚ฐ ๋“ฑ์‚ฐ(์˜ค์ „)
217
+
218
+ - ๋ถ์•…์‚ฐ์€ ์„œ์šธ์˜ ๋ถ์ชฝ์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ ํ•œ๋ณตํŒ์—์„œ๋„ ์ž์—ฐ์„ ๋งŒ๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ถ์•…์‚ฐ ์ž…๊ตฌ์—์„œ ๋“ฑ์‚ฐ์„ ์‹œ์ž‘ํ•˜์—ฌ, ๋ถ์•…์‚ฐ ์ •์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
219
+
220
+ ### 3. ์ข…๋กœ ๋ช…๋™ ์‡ผํ•‘๊ณผ ๋ง›์ง‘ ํˆฌ์–ด(๋‚ฎ)
221
+
222
+ - ๋ช…๋™: ๋‹ค์–‘ํ•œ ์‡ผํ•‘๋ชฐ๊ณผ ๋งค์žฅ์ด ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ช…๋™ ์‡ผํ•‘ํƒ€์šด, ๋ฏธ์Šคํ„ฐํŠธ์œ„์Šคํ„ฐ, ๋ฏธ์Šคํ„ฐ๋ฆฌ๋งˆ์ผ“ ๋“ฑ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”.
223
+ - ๋ง›์ง‘ ํˆฌ์–ด: ๋ช…๋™์—๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๋–ก๋ณถ์ด, ์ˆœ๋Œ€, ๋‹ญ๊ฐ•์ • ๋“ฑ์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
224
+
225
+ ### 4. ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€๊ณผ ๋•์ˆ˜๊ถ(์˜คํ›„)
226
+
227
+ - ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€: ํ˜„๋Œ€๋ฏธ์ˆ ์„ ์ „์‹œํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠน๋ณ„์ „์ด ์—ด๋ฆฐ๋‹ค๋ฉด ๋ฐฉ๋ฌธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
228
+ - ๋•์ˆ˜๊ถ: ์กฐ์„ ์‹œ๋Œ€์˜ ๊ถ๊ถ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋ด„์—๋Š” ๋ฒš๊ฝƒ์ด ์•„๋ฆ„๋‹ต๊ฒŒ ๋งŒ๋ฐœํ•ฉ๋‹ˆ๋‹ค.
229
+
230
+ ### 5. ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋‚จ์‚ฐ๊ณต์› ์‚ฐ์ฑ…(์˜คํ›„)
231
+
232
+ - ๋‚จ์‚ฐํƒ€์›Œ: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ด€๋žŒ๋Œ€์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐํƒ€์›Œ์— ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ 360๋„ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
233
+ - ๋‚จ์‚ฐ๊ณต์›: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ณต์›์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ…Œ๋งˆ ๊ณต์›๊ณผ ์กฐ๊ฒฝ์ด ์ž˜ ๋œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐ๊ณต์›์„ ์‚ฐ์ฑ…ํ•˜๋ฉฐ ํœด์‹์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
234
+
235
+ ### 6. ๋ช…๋™ ๋˜๋Š” ์ดํƒœ์›์—์„œ์˜ ์ €๋… ์‹์‚ฌ์™€ ๋ฌธํ™” ํ™œ๋™(์ €๋…)
236
+
237
+ - ๋ช…๋™: ๋‹ค์–‘ํ•œ ์ „ํ†ต์ ์ธ ํ•œ๊ตญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ช…๋™์€ ๋ฐค์—๋„ ํ™œ๊ธฐ์ฐจ๊ฒŒ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ์ƒํ™œ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค.
238
+ - ์ดํƒœ์›: ์™ธ๊ตญ์ธ ๊ด€๊ด‘๊ฐ๋“ค์ด ๋งŽ์ด ์ฐพ๋Š” ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์„ธ๊ณ„ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํด๋Ÿฝ์ด๋‚˜ ๋ฐ”๊ฐ€ ๋งŽ์€ ๋ฌธํ™”์  ํ™œ๋™์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.
239
+
240
+ ์ด ์ฝ”์Šค๋Š” ํ•˜๋ฃจ ์ข…์ผ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฌํ–‰์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ„ํšํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์ด๋™ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์‹œ๊ณ , ๊ฐœ์žฅ ์‹œ๊ฐ„๊ณผ ์ „์‹œ ์ผ์ • ๋“ฑ์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•˜์‹œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
241
+ ```
242
+
243
+ ### Quantized Versions through bitsandbytes
244
+
245
+ - *Using 8-bit precision*
246
+ - *Using 4-bit precision*
247
+
248
+ ```python
249
+ # pip install bitsandbytes
250
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
251
+
252
+
253
+ model_id = "rtzr/ko-gemma-2-9b-it"
254
+ quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
255
+ # quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)
256
+
257
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
258
+ model = AutoModelForCausalLM.from_pretrained(
259
+ model_id,
260
+ torch_dtype=torch.bfloat16,
261
+ device_map="auto",
262
+ quantization_config=quantization_config_8bit,
263
+ # quantization_config=quantization_config_4bit,
264
+ low_cpu_mem_usage=True,
265
+ )
266
+
267
+ model.eval()
268
+ instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
269
+
270
+ messages = [
271
+ {"role": "user", "content": f"{instruction}"}
272
+ ]
273
+
274
+ input_ids = tokenizer.apply_chat_template(
275
+ messages,
276
+ add_generation_prompt=True,
277
+ return_tensors="pt"
278
+ ).to(model.device)
279
+
280
+ terminators = [
281
+ tokenizer.eos_token_id,
282
+ tokenizer.convert_tokens_to_ids("<end_of_turn>")
283
+ ]
284
+
285
+ outputs = model.generate(
286
+ input_ids,
287
+ max_new_tokens=2048,
288
+ eos_token_id=terminators,
289
+ do_sample=True,
290
+ temperature=0.6,
291
+ top_p=0.9,
292
+ )
293
+
294
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
295
+ ```
296
+
297
+ ### VLLM Usage
298
+
299
+ When we use `vllm==0.5.1`, the gemma2 model cannot be loaded yet and the following [issue](https://github.com/vllm-project/vllm/issues/6237) occurs. So it is recommended to use `vllm/vllm-openai:latest` docker or [`vllm==0.5.0.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1).
300
+
301
+ ```bash
302
+ #!/bin/bash
303
+
304
+ VLLM_ATTENTION_BACKEND=FLASHINFER
305
+ MODEL_NAME="rtzr/ko-gemma-2-9b-it"
306
+
307
+ MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
308
+ docker run --rm --gpus all \
309
+ -p 8000:8000 \
310
+ --shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
311
+ -e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
312
+ -v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
313
+ vllm/vllm-openai:latest \
314
+ --model ${MODEL_NAME} --dtype auto \
315
+ --gpu-memory-utilization 0.8
316
+ ```
317
+
318
+ ## License
319
+
320
+ Gemma 2 License: <https://ai.google.dev/gemma/terms>
321
+
322
+ ## Citation
323
+
324
+ ```none
325
+ @article{RTZR,
326
+ title={ko-gemma-2-9b-it},
327
+ author={Return Zero Team},
328
+ year={2024},
329
+ url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
330
+ }
331
+ ```
332
+
333
+ ```none
334
+ @article{gemma_2024,
335
+ title={Gemma},
336
+ url={https://www.kaggle.com/m/3301},
337
+ DOI={10.34740/KAGGLE/M/3301},
338
+ publisher={Kaggle},
339
+ author={Gemma Team},
340
+ year={2024}
341
+ }
342
+ ```
343
+
344
+