aiqwe commited on
Commit
1d8e3ee
โ€ข
1 Parent(s): 892b117

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -1
README.md CHANGED
@@ -1,5 +1,92 @@
1
  ---
2
  tags:
3
  - krx
 
 
 
 
4
  ---
5
- # Model 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - krx
4
+ - finance
5
+ license: mit
6
+ language:
7
+ - ko
8
  ---
9
+
10
+ # krx-llm-competition Model Card
11
+
12
+ ๋ชจ๋ธ์€ [KRX LLM ๊ฒฝ์ง„๋Œ€ํšŒ ๋ฆฌ๋”๋ณด๋“œ](https://krxbench.koscom.co.kr/)์—์„œ ์ตœ์ข… 3์œ„๋ฅผ ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๊ธˆ์œต, ํšŒ๊ณ„ ๋“ฑ ๊ธˆ์œต๊ด€๋ จ ์ง€์‹์— ๋Œ€ํ•œ Text Generation์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
13
+ ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ ๋ฐ ํ•™์Šต์— ๊ด€๋ จ๋œ ์ฝ”๋“œ๋Š” [https://github.com/aiqwe/krx-llm-competition](https://github.com/aiqwe/krx-llm-competition)์— ์ž์„ธํ•˜๊ฒŒ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
14
+ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [krx_model_card.pdf](krx_model_card.pdf)๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์„ธ์š”.
15
+
16
+ # Usage
17
+ [https://github.com/aiqwe/krx-llm-competition](https://github.com/aiqwe/krx-llm-competition)์˜ example์„ ์ฐธ์กฐํ•˜๋ฉด ์‰ฝ๊ฒŒ inference๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
18
+
19
+ ```shell
20
+ pip install vllm
21
+ ```
22
+
23
+ ```python
24
+ TBD
25
+ ```
26
+
27
+ # Model Card
28
+ | Contents | Spec |
29
+ |--------------------------------|-------------------------------------|
30
+ | Base model | Qwen2.5-7B-Instruct |
31
+ | Machine | A100 SXM 80GB ร— 2 |
32
+ | dtype | bfloat16 |
33
+ | PEFT | LoRA (r=8, alpha=64) |
34
+ | Learning Rate | 1e-5 (varies by further training) |
35
+ | LRScheduler | Cosine (warm-up: 0.05%) |
36
+ | Optimizer | AdamW |
37
+ | Distributed / Efficient Tuning | DeepSpeed v3, Flash Attention |
38
+ | Global Batch Size | 128 |
39
+
40
+ # Datset Card
41
+ Reference ๋ฐ์ดํ„ฐ์…‹์€ ์ผ๋ถ€ ์ €์ž‘๊ถŒ ์ด์Šˆ๋กœ ์ธํ•ด Link๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
42
+ MCQA์™€ QA ๋ฐ์ดํ„ฐ์…‹์€ [https://huggingface.co/datasets/aiqwe/krx-llm-competition](https://huggingface.co/datasets/aiqwe/krx-llm-competition)์œผ๋กœ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.
43
+ [https://github.com/aiqwe/krx-llm-competition](https://github.com/aiqwe/krx-llm-competition)๋ฅผ ์ด์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์†Œ์‹ฑ Pipeline์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
44
+
45
+ ## References
46
+ | ๋ฐ์ดํ„ฐ๋ช… | url |
47
+ |-----------------------------------|------------------------------------------------------------------------------------------|
48
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | [Link](https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765) |
49
+ | ์žฌ๋ฌดํšŒ๊ณ„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ | ์ž์ฒด ์ œ์ž‘ |
50
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=42088&categoryId=42088) |
51
+ | web-text.synthetic.dataset-50k | [Link](https://huggingface.co/datasets/Cartinoe5930/web_text_synthetic_dataset_50k) |
52
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
53
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | [Link](http://open.krx.co.kr/contents/OPN04/04020000/OPN04020000.jsp#b8943a5f87282cde0d653d1ae73431c9=1) |
54
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | [Link](https://law.krx.co.kr/las/TopFrame.jsp&KRX) |
55
+ | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_beginner.pdf) |
56
+ | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒํˆฌ์ž | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_teen.pdf) |
57
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | [Link](https://opendart.fss.or.kr/) |
58
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
59
+
60
+ ## MCQA
61
+ MCQA ๋ฐ์ดํ„ฐ๋Š” Reference๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์ง€์„ ๋‹คํ˜• ๋ฌธ์ œ๋ฅผ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ๋ฌธ์ œ์™€ ๋‹ต ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Reasoning ํ…์ŠคํŠธ๊นŒ์ง€ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต์— ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
62
+ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์•ฝ 4.5๋งŒ๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ด๋ฉฐ, tiktoken์˜ o200k_base(gpt-4o, gpt-4o-mini Tokenizer)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด 2์ฒœ๋งŒ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
63
+ | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
64
+ |--------------------------------------|-----------|--------------|
65
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,203 | 277,114 |
66
+ | ์žฌ๋ฌดํšŒ๊ณ„ ๋ชฉ์ฐจ๋ฅผ ์ด์šฉํ•œ ํ•ฉ์„ฑ๋ฐ์ดํ„ฐ | 451 | 99,770 |
67
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 827 | 214,297 |
68
+ | hf_web_text_synthetic_dataset_50k | 25,461 | 7,563,529 |
69
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 2,314 | 589,763 |
70
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,183 | 230,148 |
71
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 3,015 | 580,556 |
72
+ | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | 599 | 116,472 |
73
+ | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒ ํˆฌ์ž | 408 | 77,037 |
74
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
75
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 7,410 | 1,545,842 |
76
+ | **ํ•ฉ๊ณ„** | **46,445**| **19,998,931**|
77
+
78
+ ## QA
79
+ QA ๋ฐ์ดํ„ฐ๋Š” Reference์™€ ์งˆ๋ฌธ์„ ํ•จ๊ป˜ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€๊ณผ Reference ์—†์ด ์งˆ๋ฌธ๋งŒ์„ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€ 2๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
80
+ Reference๋ฅผ ์ œ๊ณต๋ฐ›์œผ๋ฉด ๋ชจ๋ธ์€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ํ•˜์ง€๋งŒ ๋ชจ๋ธ๋งŒ์˜ ์ง€์‹์ด ์ œํ•œ๋˜์–ด ๋‹ต๋ณ€์ด ์ข€๋” ์งง์•„์ง€๊ฑฐ๋‚˜ ๋‹ค์–‘์„ฑ์ด ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
81
+ ์ด 4.8๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ 2์–ต๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
82
+ | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
83
+ |--------------------------------------|-----------|--------------|
84
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,023 | 846,970 |
85
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 4,128 | 3,181,831 |
86
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 6,526 | 5,311,890 |
87
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,510 | 1,089,342 |
88
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 4,858 | 3,587,059 |
89
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
90
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 29,920 | 5,981,839 |
91
+ | **ํ•ฉ๊ณ„** | **47,965**| **199,998,931**|
92
+