File size: 2,434 Bytes
5a82505
d48b662
 
 
5a82505
d48b662
b9e3ca2
d48b662
 
7d19d2c
57506eb
 
d48b662
d268573
 
7d19d2c
d8b43c8
b9e3ca2
 
 
 
 
 
 
 
 
d8b43c8
d48b662
a37b220
7d19d2c
 
 
78a8bab
063af45
89b4fcd
7d19d2c
30229c0
57506eb
 
c87b109
d268573
c87b109
0b0211b
c87b109
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: cc-by-nc-4.0
base_model:
- google/gemma-2-2b-it
---

# Gemma-2-2b μ΄ˆλ“±ν•™μƒ κΈ€ λ³€ν™˜κΈ°


## πŸ’» Model Description
- Gemma2-2b ν•œκ΅­ μ΄ˆλ“±ν•™μƒ κΈ€ λ³€ν™˜κΈ°λŠ” μž…λ ₯ν•œ 글을 μ΄ˆλ“±ν•™μƒμ΄ μ“΄ 것 같은 κΈ€λ‘œ λ³€ν™˜ν•΄ μ£ΌλŠ” λͺ¨λΈμž…λ‹ˆλ‹€.
- Gemma2-2b-it λͺ¨λΈμ„ base model둜 μ‚¬μš©ν•˜μ˜€κ³  LoRA기법을 μ‚¬μš©ν•˜μ—¬ 효율적으둜 fine-tuning ν•˜μ˜€μŠ΅λ‹ˆλ‹€. 

(Gemma2-2b is a model that transforms input text to resemble the writing style of an elementary school student. It is based on the Gemma2-2b-it model and was fine-tuned efficiently using the LoRA technique.)

## 🚦Usage
```
import transformers
from huggingface_hub import notebook_login

notebook_login()

BASE_MODEL = "skwh54/Gemma-2-2b-it-elementary-style-document"

model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
```

## πŸ“‚ Example
|  Input  |  Output  |
|---------|----------|
| μ•ˆλ…•ν•˜μ„Έμš”. μ €λŠ” μΉœκ΅¬μ™€ κ³΅μ›μ—μ„œ λ§Œλ‚˜μ„œ λ†€μ•˜μ–΄μš”. 날씨가 μ’‹μ•„μ„œ 놀기 μ’‹μ•˜μ–΄μš”. λ§›μžˆλŠ” μŒμ‹λ“€λ„ 많이 λ¨Ήμ—ˆμ–΄μš”. 내일도 놀고 μ‹Άλ‹€! | γ…Žγ…‡ λ‚œ μΉœκ΅¬λž‘ κ³΅μ›μ—μ„œ λ§Œλ‚˜μ„œ λ†€μ•˜λŠ”λ° 날씨가 μ’‹μ•„μ„œ 더 μ’‹μ•˜κ³  λ§›μžˆλŠ” μŒμ‹λ„ 많이 λ¨Ήμ—ˆμŒ 내일도 놀고 μ‹Άλ‹€ |
|μ˜€λŠ˜μ€ νšŒμ‚¬μ—μ„œ μ€‘μš”ν•œ ν”„λ‘œμ νŠΈ 회의λ₯Ό ν–ˆλ‹€. μ€€λΉ„ν•œ 만큼 쒋은 ν”Όλ“œλ°±μ„ λ°›μ•„μ„œ λΏŒλ“―ν–ˆλ‹€. 퇴근 ν›„μ—λŠ” μš΄λ™μ„ ν•˜κ³  λ‚˜μ„œ μ§‘μ—μ„œ 책을 읽으며 ν•˜λ£¨λ₯Ό λ§ˆλ¬΄λ¦¬ν–ˆλ‹€.|였늘 νšŒμ‚¬μ—μ„œ μ€‘μš”ν•œ ν”„λ‘œμ νŠΈ νšŒμ˜ν•¨ γ…‹γ…‹ μ€€λΉ„ν•œ 만큼 쒋은 ν”Όλ“œλ°± λ°›μ•„μ„œ λΏŒλ“―ν•¨ 퇴근 후에 μš΄λ™ν•˜κ³  집에 였면 μ±… 읽으며 ν•˜λ£¨ λ§ˆλ¬΄λ¦¬ν•¨|


## πŸ“ƒ Training data
- [korean_smile_style_dataset](https://github.com/smilegate-ai/korean_smile_style_dataset)을 μ‚¬μš©ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
- λ³Έ λ°μ΄ν„°λŠ” Smilegate AIμ—μ„œ κ³΅κ°œν•˜λŠ” ν•œκ΅­μ–΄ 문체 μŠ€νƒ€μΌ λ³€ν™˜ "SmileStyle" λ°μ΄ν„°μ…‹μž…λ‹ˆλ‹€.
- μ—¬λŸ¬ μŠ€νƒ€μΌμ˜ 문체가 μ‘΄μž¬ν•˜λ©° 이 μ€‘μ—μ„œ formalκ³Ό choding μŠ€νƒ€μΌμ˜ λ°μ΄ν„°λ§Œμ„ μΆ”μΆœν•˜μ—¬ μ‚¬μš©ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

(The korean_smile_style_dataset was used for training. This dataset, "SmileStyle," is provided by Smilegate AI and includes various writing styles in Korean. Only the formal and elementary school styles were used in this model.)

## πŸƒβ€β™‚οΈβ€βž‘οΈ Coworker
[JiwookHan](https://huggingface.co/mreraser)