korscideberta / README.md
kkmkorea's picture
Update README.md
b78d70a
|
raw
history blame
7.25 kB
---
license: mit
language:
- ko
metrics:
- accuracy
---
# Model Card for KorSciDeBERTa
<!-- Provide a quick summary of what the model is/does. -->
KorSciDeBERTa๋Š” Microsoft DeBERTa ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ณ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋…ผ๋ฌธ, NTIS ์—ฐ๊ตฌ๊ณผ์ œ, ํŠนํ—ˆ, ๋‰ด์Šค, ํ•œ๊ตญ์–ด ์œ„ํ‚ค ์ฝ”ํผ์Šค ์ด 146GB๋ฅผ ์‚ฌ์ „ํ•™์Šตํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง ๋˜๋Š” ๋‹ค์Œ ๋ฌธ์žฅ ์˜ˆ์ธก์— ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ , ๋˜ํ•œ ๋ฌธ์žฅ ๋ถ„๋ฅ˜, ๋‹จ์–ด ํ† ํฐ ๋ถ„๋ฅ˜ ๋˜๋Š” ์งˆ์˜์‘๋‹ต๊ณผ ๊ฐ™์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ ๋ฏธ์„ธ ์กฐ์ •์„ ํ†ตํ•ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** KISTI
- **Model type:** deberta-v2
- **Language(s) (NLP):** ํ•œ๊ธ€(ko)
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository 1:** https://huggingface.co/kisti/korscideberta
- **Repository 2:** https://aida.kisti.re.kr/
## Uses
### Downstream Use - Load model directly
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
git clone https://huggingface.co/kisti/korscideberta; cd korscideberta
- **korscideberta-abstractcls.ipynb**
from tokenization_korscideberta import DebertaV2Tokenizer
from transformers import AutoModelForSequenceClassification
tokenizer = DebertaV2Tokenizer.from_pretrained("kisti/korscideberta")
model = AutoModelForSequenceClassification.from_pretrained("kisti/korscideberta", num_labels=6, hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1)
#model = AutoModelForMaskedLM.from_pretrained("kisti/korscideberta")
''''''
train_metrics = trainer.train().metrics
trainer.save_metrics("train", train_metrics)
trainer.push_to_hub()
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
์ด ๋ชจ๋ธ์€ ์˜๋„์ ์œผ๋กœ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ ๋Œ€์ ์ด๋‚˜ ์†Œ์™ธ๋œ ํ™˜๊ฒฝ์„ ์กฐ์„ฑํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜์–ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค.
์ด ๋ชจ๋ธ์€ '๊ณ ์œ„ํ—˜ ์„ค์ •'์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์‚ฌ๋žŒ์ด๋‚˜ ์‚ฌ๋ฌผ์— ๋Œ€ํ•œ ์ค‘์š”ํ•œ ๊ฒฐ์ •์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๊ฒŒ ์„ค๊ณ„๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ถœ๋ ฅ๋ฌผ์€ ์‚ฌ์‹ค์ด ์•„๋‹ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
'๊ณ ์œ„ํ—˜ ์„ค์ •'์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‚ฌํ•ญ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:
์˜๋ฃŒ/์ •์น˜/๋ฒ•๋ฅ /๊ธˆ์œต ๋ถ„์•ผ์—์„œ์˜ ์‚ฌ์šฉ, ๊ณ ์šฉ/๊ต์œก/์‹ ์šฉ ๋ถ„์•ผ์—์„œ์˜ ์ธ๋ฌผ ํ‰๊ฐ€, ์ž๋™์œผ๋กœ ์ค‘์š”ํ•œ ๊ฒƒ์„ ๊ฒฐ์ •ํ•˜๊ธฐ, (๊ฐ€์งœ)์‚ฌ์‹ค์„ ์ƒ์„ฑํ•˜๊ธฐ, ์‹ ๋ขฐ๋„ ๋†’์€ ์š”์•ฝ๋ฌธ ์ƒ์„ฑ, ํ•ญ์ƒ ์˜ณ์•„์•ผ๋งŒ ํ•˜๋Š” ์˜ˆ์ธก ์ƒ์„ฑ ๋“ฑ.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
์—ฐ๊ตฌ๋ชฉ์ ์œผ๋กœ ์ €์ž‘๊ถŒ ๋ฌธ์ œ๊ฐ€ ์—†๋Š” ๋ง๋ญ‰์น˜ ๋ฐ์ดํ„ฐ๋งŒ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์ž๋Š” ์•„๋ž˜์˜ ์œ„ํ—˜ ์š”์ธ๋“ค์„ ์ธ์‹ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์šฉ๋œ ๋ง๋ญ‰์น˜๋Š” ๋Œ€๋ถ€๋ถ„ ์ค‘๋ฆฝ์ ์ธ ์„ฑ๊ฒฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์–ธ์–ด ๋ชจ๋ธ์˜ ํŠน์„ฑ์ƒ ์•„๋ž˜์™€ ๊ฐ™์€ ์œค๋ฆฌ ๊ด€๋ จ ์š”์†Œ๋ฅผ ์ผ๋ถ€ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
ํŠน์ • ๊ด€์ ์— ๋Œ€ํ•œ ๊ณผ๋Œ€/๊ณผ์†Œ ํ‘œํ˜„, ๊ณ ์ • ๊ด€๋…, ๊ฐœ์ธ ์ •๋ณด, ์ฆ์˜ค/๋ชจ์š• ๋˜๋Š” ํญ๋ ฅ์ ์ธ ์–ธ์–ด, ์ฐจ๋ณ„์ ์ด๊ฑฐ๋‚˜ ํŽธ๊ฒฌ์ ์ธ ์–ธ์–ด, ๊ด€๋ จ์ด ์—†๊ฑฐ๋‚˜ ๋ฐ˜๋ณต์ ์ธ ์ถœ๋ ฅ ์ƒ์„ฑ ๋“ฑ.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
- ๊ณผํ•™๊ธฐ์ˆ ๋ถ„์•ผ ํ† ํฌ๋‚˜์ด์ € (KorSci Tokenizer)
- ๋ณธ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉ๋œ ์ฝ”ํผ์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ช…์‚ฌ ๋ฐ ๋ณตํ•ฉ๋ช…์‚ฌ ์•ฝ 600๋งŒ๊ฐœ์˜ ์‚ฌ์šฉ์ž์‚ฌ์ „์ด ์ถ”๊ฐ€๋œ [Mecab-ko Tokenizer](https://bitbucket.org/eunjeon/mecab-ko/src/master/)์™€ ๊ธฐ์กด SentencePiece-BPE๊ฐ€ ๋ณ‘ํ•ฉ๋˜์–ด์ง„ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ง๋ญ‰์น˜๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
- Total 128,100 words
- Included special tokens ( < unk >, < cls >, < s >, < mask > )
- File name : spm.model, vocab.txt
#### Training Hyperparameters
- **model_size:** base
- **num_train_steps:** 1,600,000
- **train_batch_size:** 4,096 * 4 accumulative update = 16,384
- **learning_rate:** 1e-4
- **max_seq_length:** 512
- **vocab_size:** 128,100
- **Training regime:** fp16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Data Card if possible. -->
๋ณธ ์–ธ์–ด๋ชจ๋ธ์˜ ์„ฑ๋Šฅํ‰๊ฐ€๋Š” ์—ฐ๊ตฌ๊ณผ์ œ๋ณด๊ณ ์„œ ๊ณผํ•™๊ธฐ์ˆ ํ‘œ์ค€๋ถ„๋ฅ˜ ํƒœ์Šคํฌ์— ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
- ์—ฐ๊ตฌ๊ณผ์ œ๋ณด๊ณ ์„œ ๊ณผํ•™๊ธฐ์ˆ ํ‘œ์ค€๋ถ„๋ฅ˜ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹(doi.org/10.23057/50), 145 Classes, 209,454 Training Set, 89,767 Test Set
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
F1-micro/macro: ์ •๋‹ต Top3 ์ค‘ ์ตœ์†Œ 1๊ฐœ ์˜ˆ์ธก์‹œ ์„ฑ๊ณต ๊ธฐ์ค€
F1-strict: ์ •๋‹ต Top3 ์ค‘ ์˜ˆ์ธกํ•œ ์ˆ˜ ๋งŒํผ ์„ฑ๊ณต ๊ธฐ์ค€
### Results
F1-micro: 0.85, F1-macro: 0.52, F1-strict: 0.71
## Technical Specifications
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
KISTI ๊ตญ๊ฐ€์Šˆํผ์ปดํ“จํŒ…์„ผํ„ฐ NEURON ์‹œ์Šคํ…œ. HPE ClusterStor E1000, Lustre, Slurm
#### Hardware
NVIDIA A100 80G GPU 24EA
#### Software
Python 3.9, Cuda 11.8, PyTorch 1.10
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณด์—ฐ๊ตฌ์› (2023) : ํ•œ๊ตญ์–ด ๊ณผํ•™๊ธฐ์ˆ ๋ถ„์•ผ DeBERTa ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ (KorSciDeBERTa). Version 1.0. ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณด์—ฐ๊ตฌ์›.
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors
๊น€๊ฒฝ๋ฏผ, ๊น€์€ํฌ, ๊น€์„ฑ์ฐฌ. ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณด์—ฐ๊ตฌ์› ์ธ๊ณต์ง€๋Šฅ๋ฐ์ดํ„ฐ์—ฐ๊ตฌ๋‹จ
## Model Card Contact
๊น€๊ฒฝ๋ฏผ, kkmkorea kisti.re.kr