File size: 7,734 Bytes
8b1fa81
 
1d8e3ee
b5d8ff3
 
 
 
1d8e3ee
 
233ab0f
b5d8ff3
ddf2fc7
b5d8ff3
 
 
 
892b117
1d8e3ee
a150e91
e59356e
ddf2fc7
 
58645f7
2e91466
f7d11dc
 
 
c0dac4c
1d8e3ee
 
c0dac4c
1118c1a
1d8e3ee
 
 
 
 
 
1118c1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d8e3ee
 
 
 
 
 
 
 
 
 
 
 
 
 
13cdc80
c0dac4c
 
1d8e3ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb6a322
 
62cff84
31051bf
ca2be40
bb6a322
 
 
ddf2fc7
bb6a322
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
tags:
- finance
- accounting
- stock
- quant
- economics
language:
- ko
license: apache-2.0
datasets:
- aiqwe/FinShibainu
base_model:
- Qwen/Qwen2.5-7B-Instruct
pipeline_tag: question-answering
library_name: transformers
---

# FinShibainu Model Card

+ github: [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)
+ dataset: [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)

๋ชจ๋ธ์€ [KRX LLM ๊ฒฝ์ง„๋Œ€ํšŒ ๋ฆฌ๋”๋ณด๋“œ](https://krxbench.koscom.co.kr/)์—์„œ ์šฐ์ˆ˜์ƒ์„ ์ˆ˜์ƒํ•œ shibainu24 ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๊ธˆ์œต, ํšŒ๊ณ„ ๋“ฑ ๊ธˆ์œต๊ด€๋ จ ์ง€์‹์— ๋Œ€ํ•œ Text Generation์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.  

+ Vanilla model : [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
  
๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ ๋ฐ ํ•™์Šต์— ๊ด€๋ จ๋œ ์ฝ”๋“œ๋Š” [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์— ์ž์„ธํ•˜๊ฒŒ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

# Usage
[https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์˜ example์„ ์ฐธ์กฐํ•˜๋ฉด ์‰ฝ๊ฒŒ inference๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋Œ€๋ถ€๋ถ„์˜ Inference๋Š” RTX-3090 ์ด์ƒ์—์„œ ๋‹จ์ผ GPU ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

```shell
pip install vllm
```

```python
import pandas as pd
from vllm import LLM

inputs = [
    "์™ธํ™˜์‹œ์žฅ์—์„œ ์ผ๋ณธ ์—”ํ™”์™€ ๋ฏธ๊ตญ ๋‹ฌ๋Ÿฌ์˜ ํ™˜์œจ์ด ๋‘ ์‹œ์žฅ์—์„œ ์•ฝ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ์ด๋•Œ ๋ฌด์œ„ํ—˜ ์ด์ต์„ ์–ป๊ธฐ ์œ„ํ•œ ์ ์ ˆํ•œ ๊ฑฐ๋ž˜ ์ „๋žต์€ ๋ฌด์—‡์ธ๊ฐ€?",
    "์‹ ์ฃผ์ธ์ˆ˜๊ถŒ๋ถ€์‚ฌ์ฑ„(BW)์—์„œ ์ฑ„๊ถŒ์ž๊ฐ€ ์‹ ์ฃผ์ธ์ˆ˜๊ถŒ์„ ํ–‰์‚ฌํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์–ด๋–ค ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”๊ฐ€?",
    "๊ณต๋งค๋„(Short Selling)์— ๋Œ€ํ•œ ์„ค๋ช…์œผ๋กœ ์˜ณ์ง€ ์•Š์€ ๊ฒƒ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?"
]

llm = LLM(model="aiqwe/krx-llm-competition", tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.7, max_tokens=128)
outputs = llm.generate(inputs, sampling_params)
for o in outputs:
    print(o.prompt)
    print(o.outputs[0].text)
    print("*"*100)
```

# Model Card
| Contents                       | Spec                                |
|--------------------------------|-------------------------------------|
| Base model                     | Qwen2.5-7B-Instruct                |
| dtype                          | bfloat16                           |
| PEFT                           | LoRA (r=8, alpha=64)               |
| Learning Rate                  | 1e-5 (varies by further training)  |
| LRScheduler                    | Cosine (warm-up: 0.05%)            |
| Optimizer                      | AdamW                              |
| Distributed / Efficient Tuning | DeepSpeed v3, Flash Attention      |

# Datset Card
Reference ๋ฐ์ดํ„ฐ์…‹์€ ์ผ๋ถ€ ์ €์ž‘๊ถŒ ๊ด€๊ณ„๋กœ ์ธํ•ด Link๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
MCQA์™€ QA ๋ฐ์ดํ„ฐ์…‹์€ [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)์œผ๋กœ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.  
๋˜ํ•œ [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)๋ฅผ ์ด์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์†Œ์‹ฑ Pipeline์„ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.  

## References
| ๋ฐ์ดํ„ฐ๋ช…                          | url                                                                                      |
|-----------------------------------|------------------------------------------------------------------------------------------|
| ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„       | [Link](https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765) |
| ์žฌ๋ฌดํšŒ๊ณ„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ              | ์ž์ฒด ์ œ์ž‘                                                                                        |
| ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „                  | [Link](https://terms.naver.com/list.naver?cid=42088&categoryId=42088) |
| web-text.synthetic.dataset-50k    | [Link](https://huggingface.co/datasets/Cartinoe5930/web_text_synthetic_dataset_50k) |
| ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                  | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ          | [Link](http://open.krx.co.kr/contents/OPN04/04020000/OPN04020000.jsp#b8943a5f87282cde0d653d1ae73431c9=1) |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ •                    | [Link](https://law.krx.co.kr/las/TopFrame.jsp&KRX) |
| ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ           | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_beginner.pdf) |
| ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒํˆฌ์ž            | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_teen.pdf) |
| ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ           | [Link](https://opendart.fss.or.kr/)                              |
| ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                  | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |

## MCQA
MCQA ๋ฐ์ดํ„ฐ๋Š” Reference๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์ง€์„ ๋‹คํ˜• ๋ฌธ์ œ๋ฅผ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ๋ฌธ์ œ์™€ ๋‹ต ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Reasoning ํ…์ŠคํŠธ๊นŒ์ง€ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต์— ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.  
ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์•ฝ 4.5๋งŒ๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ด๋ฉฐ, tiktoken์˜ o200k_base(gpt-4o, gpt-4o-mini Tokenizer)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด 2์ฒœ๋งŒ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
| ๋ฐ์ดํ„ฐ๋ช…                             | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜      |
|--------------------------------------|-----------|--------------|
| ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„          | 1,203     | 277,114      |
| ์žฌ๋ฌดํšŒ๊ณ„ ๋ชฉ์ฐจ๋ฅผ ์ด์šฉํ•œ ํ•ฉ์„ฑ๋ฐ์ดํ„ฐ    | 451       | 99,770       |
| ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „                     | 827       | 214,297      |
| hf_web_text_synthetic_dataset_50k    | 25,461    | 7,563,529    |
| ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                     | 2,314     | 589,763      |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ             | 1,183     | 230,148      |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ •                       | 3,015     | 580,556      |
| ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ              | 599       | 116,472      |
| ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒ ํˆฌ์ž              | 408       | 77,037       |
| ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ              | 3,574     | 629,807      |
| ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                     | 7,410     | 1,545,842    |
| **ํ•ฉ๊ณ„**                             | **46,445**| **19,998,931**|

## QA
QA ๋ฐ์ดํ„ฐ๋Š” Reference์™€ ์งˆ๋ฌธ์„ ํ•จ๊ป˜ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€๊ณผ Reference ์—†์ด ์งˆ๋ฌธ๋งŒ์„ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€ 2๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.  
Reference๋ฅผ ์ œ๊ณต๋ฐ›์œผ๋ฉด ๋ชจ๋ธ์€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ํ•˜์ง€๋งŒ ๋ชจ๋ธ๋งŒ์˜ ์ง€์‹์ด ์ œํ•œ๋˜์–ด ๋‹ต๋ณ€์ด ์ข€๋” ์งง์•„์ง€๊ฑฐ๋‚˜ ๋‹ค์–‘์„ฑ์ด ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
์ด 4.8๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ 2์–ต๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
| ๋ฐ์ดํ„ฐ๋ช…                             | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜      |
|--------------------------------------|-----------|--------------|
| ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„          | 1,023     | 846,970      |
| ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „                     | 4,128     | 3,181,831    |
| ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                     | 6,526     | 5,311,890    |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ             | 1,510     | 1,089,342    |
| ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ •                       | 4,858     | 3,587,059    |
| ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ              | 3,574     | 629,807      |
| ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „                     | 29,920    | 5,981,839    |
| **ํ•ฉ๊ณ„**                             | **47,965**| **199,998,931**|

# Citation
```bibitex
@misc{jaylee2024finshibainu,
  author = {Jay Lee},
  title = {FinShibainu: Korean specified finance model},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/aiqwe/FinShibainu}
}
```