RichardErkhov's picture
uploaded readme
855e324 verified
|
raw
history blame
14.9 kB
Quantization made by Richard Erkhov.
[Github](https://github.com/RichardErkhov)
[Discord](https://discord.gg/pvy7H8DZMG)
[Request more models](https://github.com/RichardErkhov/quant_request)
ko-gemma-2-9b-it - bnb 4bits
- Model creator: https://huggingface.co/rtzr/
- Original model: https://huggingface.co/rtzr/ko-gemma-2-9b-it/
Original model description:
---
license: gemma
library_name: transformers
pipeline_tag: text-generation
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
To access Gemma on Hugging Face, youโ€™re required to review and agree to
Googleโ€™s usage license. To do this, please ensure youโ€™re logged in to Hugging
Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
tags:
- conversational
base_model:
- google/gemma-2-9b
language:
- ko
---
## Model Details
### Ko-Gemma-2-9B-IT
**[Ko-Gemma-2-9B-IT](https://huggingface.co/rtzr/ko-gemma-2-9b-it)** is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) training specifically for Human Feedback. The datasets include:
- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-korean-dpo-pairs)
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.
#### *Inputs and outputs*
- **Input:** Text string, such as a question, a prompt, or a document to be summarized.
- **Output:** Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.
### Google Gemma 2
Gemma is a family of lightweight, state-of-the-art open models from Google,
built from the same research and technology used to create the Gemini models.
They are text-to-text, decoder-only large language models, available in English,
with open weights for both pre-trained variants and instruction-tuned variants.
Gemma models are well-suited for a variety of text generation tasks, including
question answering, summarization, and reasoning. Their relatively small size
makes it possible to deploy them in environments with limited resources such as
a laptop, desktop or your own cloud infrastructure, democratizing access to
state of the art AI models and helping foster innovation for everyone.
## Benchmark Scores
We evaluated it internally using [LogicKor](https://github.com/instructkr/LogicKor) code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.
| Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
|:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
| [rtzr/ko-gemma-2-9b-it](https://huggingface.co/rtzr/ko-gemma-2-9b-it) | 8.71 / 8.00 | 9.14 / 8.00 | 9.43 / 9.29 | 9.00 / 9.43 | 9.57 / 9.86 | 7.14 / 5.00 | 8.83 | 8.26 | 8.55 |
| [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 8.57 / 7.71 | 8.86 / 7.00 | 9.29 / 9.29 | 9.29 / 9.57 | 8.57 / 8.29 | 6.86 / 3.86 | 8.57 | 7.62 | 8.10 |
| [MLP-KTLim/llama-3-Korean-Bllossom-8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B) | 6.43 / 5.71 | 6.86 / 5.14 | 9.14 / 8.57 | 8.29 / 8.14 | 8.43 / 9.29 | 5.71 / 5.29 | 7.48 | 7.02 | 7.25 |
| [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) | 5.57 / 4.29 | 8.14 / 5.14 | 8.29 / 6.29 | 6.43 / 7.86 | 9.29 / 8.57 | 6.57 / 3.71 | 7.38 | 5.98 | 6.68 |
| [allganize/Llama-3-Alpha-Ko-8B-Instruct](https://huggingface.co/allganize/Llama-3-Alpha-Ko-8B-Instruct) | 4.57 / 3.00 | 6.86 / 6.43 | 7.43 / 6.71 | 8.43 / 8.43| 7.71 / 8.71 | 6.71 / 4.43 | 6.95 | 6.29 | 6.62 |
## Usage
### Install Dependencies
You must install transformers >= 4.42.3 for gemma2 models.
```bash
pip install transformers==4.42.3 accelerate
```
### Python code with Pipeline
```python
import transformers
import torch
model_id = "rtzr/ko-gemma-2-9b-it"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
pipeline.model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = pipeline(
prompt,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
```
```markdown
์„œ์šธ์€ ์—ญ์‚ฌ, ๋ฌธํ™”, ํ˜„๋Œ€์„ฑ์ด ์กฐํ™”๋ฅผ ์ด๋ฃฌ ๋งค๋ ฅ์ ์ธ ๋„์‹œ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ด€๊ด‘์ง€์™€ ๋ช…์†Œ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค 3๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.
**1. ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ํ•œ๊ตญ๊ด€๊ด‘์ฝ”์Šค**
1. **๊ฒฝ๋ณต๊ถ**: ์กฐ์„  ์‹œ๋Œ€์˜ ์›…์žฅํ•œ ์™•๊ถ์„ ๋งŒ๋ฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งค๋…„ ๋ด„์— ์—ด๋ฆฌ๋Š” '์ถ˜์ถ”์—ฐํšŒ'๋Š” ๊ฒฝ๋ณต๊ถ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋”์šฑ ๋‹๋ณด์ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
2. **๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„**: ๊ณ ํ’์Šค๋Ÿฌ์šด ํ•œ์˜ฅ์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ณณ์œผ๋กœ, ์ „ํ†ต ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. '๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„ ๋ฌธํ™”์ฒดํ—˜๊ด€'์—์„œ๋Š” ํ•œ๋ณต ์ฒดํ—˜๋ถ€ํ„ฐ ์ข…์ด๋งŒํ™”, ํ•œ๊ธ€ ์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
3. **์ธ์‚ฌ๋™**: ์„œ์ , ๋ฏธ์ˆ ๊ด€, ํ•œ์‹๋‹น์ด ๋งŽ์€ ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ '์ธ์‚ฌ๋™ ๋ฌธํ™”๊ด€'์—์„œ๋Š” ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์ „์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **๊ด‘ํ™”๋ฌธ** ๋ฐ **๋ช…๋™**: ํ˜„๋Œ€์ ์ธ ์‡ผํ•‘๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์ฆ๋น„ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๊ด‘ํ™”๋ฌธ์€ ํŠนํžˆ ์ Š์€์ด๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ์ŠคํŠธ๋ฆฌํŠธ ํŒจ์…˜์„ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ๋ฐค๊ฑฐ๋ฆฌ์—์„œ ํ™œ๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
**2. ๋„์‹œ์˜ ๋ชจ์Šต์„ ๋ฐ”๋ผ๋ณด๋Š” ๋ทฐํˆฌ์–ด ์ฝ”์Šค**
1. **๋‚จ์‚ฐํƒ€์›Œ**: ์„œ์šธ์˜ ์ƒ์ง•์ ์ธ ๊ฑด๋ฌผ๋กœ, ๊ผญ๋Œ€๊ธฐ์—์„œ ํŽผ์ณ์ง€๋Š” 360๋„์˜ ๊ฒฝ์น˜๊ฐ€ ์••๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐค์ด ๋˜๋ฉด ์กฐ๋ช…์ด ์–ด์šฐ๋Ÿฌ์ ธ ๋”์šฑ ์•„๋ฆ„๋‹ค์›Œ์ง‘๋‹ˆ๋‹ค.
2. **์„œ์šธํƒ€์›Œ**: ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋น„์Šทํ•œ ์œ„์น˜๋กœ, ๋†’์ด๊ฐ€ ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋„“์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธํƒ€์›Œ ๋‚ด๋ถ€์—๋Š” ๋‹ค์–‘ํ•œ ์ „์‹œ๊ด€๊ณผ ๋ ˆ์Šคํ† ๋ž‘๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
3. **๋ถ์•…์‚ฐ**: ์„œ์šธ์˜ ์ค‘์‹ฌ๋ถ€์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ์˜ ๊ฒฝ์น˜๋ฅผ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ถ์•…์‚ฐ ์ •์ƒ์ธ ๋ถ์•…์‚ฌ์—์„œ๋„ ์ข‹์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **์„œ์šธ์ˆฒ**: ๋…น์ง€ ๊ณต๊ฐ„์œผ๋กœ, ๋„์‹œ์˜ ํ˜ผ์žกํ•จ์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์„œ์šธ์ˆฒ ๋‚ด๋ถ€์—๋Š” '์„œ์šธ์ˆฒ ์•„ํŠธํ”„๋ ˆ์  ํŠธ'๋ผ๋Š” ๊ณต๊ฐ„์ด ์žˆ์–ด ์˜ˆ์ˆ ๊ณผ ์ž์—ฐ์„ ํ•จ๊ป˜ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
**3. ํ˜„๋Œ€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚˜๋Š” ์ฝ”์Šค**
1. **์‚ผ์„ฑ๋™**: ํ˜„๋Œ€ ๋ฏธ์ˆ ๊ด€์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, '์‚ผ์„ฑ ๋ฏธ์ˆ ๊ด€', '์•„๋ชจ๋ฆฌ์นด๋‚˜์Šค ๊ฐค๋Ÿฌ๋ฆฌ' ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ฝ”์—‘์Šค'๋‚˜ '์•„ํฌ์นด๋กœํฌ์Šค' ๋“ฑ์˜ ๋ช…์†Œ๋„ ๊ฐ€๊นŒ์šด ๊ณณ์— ์žˆ์Šต๋‹ˆ๋‹ค.
2. **์ดํƒœ์›**: ์™ธ๊ตญ์ธ๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์™ธ๊ตญ ์Œ์‹์„ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ดํƒœ์› ๊ธ€๋กœ์ปฌ๋ฌธํ™”์„ผํ„ฐ'์—์„œ๋Š” ์„ธ๊ณ„ ๊ฐ๊ตญ์˜ ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
3. **ํ™๋Œ€**: ์ Š์€์ด๋“ค์˜ ๋ฌธํ™”๊ฐ€ ๋„˜์น˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. 'ํ™๋Œ€ ๋กค๋งํ™€'์€ ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐฉ๋ฌธํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, 'ํ™๋Œ€ ์„œ์ ๊ฑฐ๋ฆฌ'์—์„œ๋Š” ๋…์„œ์™€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **๊ฐ•๋‚จ**: ์„œ์šธ์˜ ํ˜„๋Œ€์  ๋ชจ์Šต์„ ์ž˜ ๋ณด์—ฌ์ฃผ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. '๊ฐ•๋‚จ์—ญ'์„ ์ค‘์‹ฌ์œผ๋กœ ๋งŽ์€ ๊ณ ๊ธ‰ ์‡ผํ•‘๋ชฐ๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ์ฝ”์Šค๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์Šต์„ ํ•œ ๋ฒˆ์— ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”. ๊ฐ์ž์˜ ์ทจํ–ฅ์— ๋งž์ถฐ ์ฝ”์Šค๋ฅผ ์กฐ์ ˆํ•˜์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
```
### Python code with AutoModel
```python
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "rtzr/ko-gemma-2-9b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = model.generate(
input_ids,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
```
```markdown
์„œ์šธ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด๋“œ๋ฆด๊ฒŒ์š”. ํ•˜๋ฃจ ์ข…์ผ ์ฆ๊ฒ๊ฒŒ ์—ฌํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฃจํŠธ๋กœ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
### 1. ์„œ์šธ์—ญ์‚ฌ๊ด€ ๋ฐ ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„(์˜ค์ „)
- ์„œ์šธ์—ญ์‚ฌ๊ด€: ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ „์‹œ๋ฌผ๊ณผ ์ƒ์„ค์ „์‹œ๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„: ์„œ์šธ์˜ ํ•œ์˜ฅ์„ ๋ณด์กดํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์กฐ์„  ์‹œ๋Œ€์˜ ๋ถ„์œ„๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•œ์˜ฅ์—์„œ ๋ฌธํ™” ์ฝ˜ํ…์ธ ๋„ ์ œ๊ณตํ•˜๋Š” ๊ณณ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค.
### 2. ๋ถ์•…์‚ฐ ์ž…์žฅ๊ณผ ๋ถ์•…์‚ฐ ๋“ฑ์‚ฐ(์˜ค์ „)
- ๋ถ์•…์‚ฐ์€ ์„œ์šธ์˜ ๋ถ์ชฝ์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ ํ•œ๋ณตํŒ์—์„œ๋„ ์ž์—ฐ์„ ๋งŒ๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ถ์•…์‚ฐ ์ž…๊ตฌ์—์„œ ๋“ฑ์‚ฐ์„ ์‹œ์ž‘ํ•˜์—ฌ, ๋ถ์•…์‚ฐ ์ •์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### 3. ์ข…๋กœ ๋ช…๋™ ์‡ผํ•‘๊ณผ ๋ง›์ง‘ ํˆฌ์–ด(๋‚ฎ)
- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์‡ผํ•‘๋ชฐ๊ณผ ๋งค์žฅ์ด ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ช…๋™ ์‡ผํ•‘ํƒ€์šด, ๋ฏธ์Šคํ„ฐํŠธ์œ„์Šคํ„ฐ, ๋ฏธ์Šคํ„ฐ๋ฆฌ๋งˆ์ผ“ ๋“ฑ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”.
- ๋ง›์ง‘ ํˆฌ์–ด: ๋ช…๋™์—๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๋–ก๋ณถ์ด, ์ˆœ๋Œ€, ๋‹ญ๊ฐ•์ • ๋“ฑ์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
### 4. ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€๊ณผ ๋•์ˆ˜๊ถ(์˜คํ›„)
- ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€: ํ˜„๋Œ€๋ฏธ์ˆ ์„ ์ „์‹œํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠน๋ณ„์ „์ด ์—ด๋ฆฐ๋‹ค๋ฉด ๋ฐฉ๋ฌธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋•์ˆ˜๊ถ: ์กฐ์„ ์‹œ๋Œ€์˜ ๊ถ๊ถ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋ด„์—๋Š” ๋ฒš๊ฝƒ์ด ์•„๋ฆ„๋‹ต๊ฒŒ ๋งŒ๋ฐœํ•ฉ๋‹ˆ๋‹ค.
### 5. ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋‚จ์‚ฐ๊ณต์› ์‚ฐ์ฑ…(์˜คํ›„)
- ๋‚จ์‚ฐํƒ€์›Œ: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ด€๋žŒ๋Œ€์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐํƒ€์›Œ์— ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ 360๋„ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋‚จ์‚ฐ๊ณต์›: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ณต์›์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ…Œ๋งˆ ๊ณต์›๊ณผ ์กฐ๊ฒฝ์ด ์ž˜ ๋œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐ๊ณต์›์„ ์‚ฐ์ฑ…ํ•˜๋ฉฐ ํœด์‹์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### 6. ๋ช…๋™ ๋˜๋Š” ์ดํƒœ์›์—์„œ์˜ ์ €๋… ์‹์‚ฌ์™€ ๋ฌธํ™” ํ™œ๋™(์ €๋…)
- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์ „ํ†ต์ ์ธ ํ•œ๊ตญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ช…๋™์€ ๋ฐค์—๋„ ํ™œ๊ธฐ์ฐจ๊ฒŒ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ์ƒํ™œ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค.
- ์ดํƒœ์›: ์™ธ๊ตญ์ธ ๊ด€๊ด‘๊ฐ๋“ค์ด ๋งŽ์ด ์ฐพ๋Š” ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์„ธ๊ณ„ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํด๋Ÿฝ์ด๋‚˜ ๋ฐ”๊ฐ€ ๋งŽ์€ ๋ฌธํ™”์  ํ™œ๋™์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.
์ด ์ฝ”์Šค๋Š” ํ•˜๋ฃจ ์ข…์ผ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฌํ–‰์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ„ํšํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์ด๋™ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์‹œ๊ณ , ๊ฐœ์žฅ ์‹œ๊ฐ„๊ณผ ์ „์‹œ ์ผ์ • ๋“ฑ์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•˜์‹œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!
```
### Quantized Versions through bitsandbytes
- *Using 8-bit precision*
- *Using 4-bit precision*
```python
# pip install bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "rtzr/ko-gemma-2-9b-it"
quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
# quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config_8bit,
# quantization_config=quantization_config_4bit,
low_cpu_mem_usage=True,
)
model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = model.generate(
input_ids,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
```
### VLLM Usage
When we use `vllm==0.5.1`, the gemma2 model cannot be loaded yet and the following [issue](https://github.com/vllm-project/vllm/issues/6237) occurs. So it is recommended to use `vllm/vllm-openai:latest` docker or [`vllm==0.5.0.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1).
```bash
#!/bin/bash
VLLM_ATTENTION_BACKEND=FLASHINFER
MODEL_NAME="rtzr/ko-gemma-2-9b-it"
MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
docker run --rm --gpus all \
-p 8000:8000 \
--shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
-e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
-v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
vllm/vllm-openai:latest \
--model ${MODEL_NAME} --dtype auto \
--gpu-memory-utilization 0.8
```
## License
Gemma 2 License: <https://ai.google.dev/gemma/terms>
## Citation
```none
@article{RTZR,
title={ko-gemma-2-9b-it},
author={Return Zero Team},
year={2024},
url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
}
```
```none
@article{gemma_2024,
title={Gemma},
url={https://www.kaggle.com/m/3301},
DOI={10.34740/KAGGLE/M/3301},
publisher={Kaggle},
author={Gemma Team},
year={2024}
}
```