ko-TextNumbarT / README.md
lIlBrother's picture
Update: ์ตœ์ข… ์™„๋ฃŒ ๋ชจ๋ธ์— ๋Œ€ํ•œ README ํ™•์ •
59898c9
|
raw
history blame
6.21 kB
metadata
language:
  - ko
license: apache-2.0
library_name: transformers
tags:
  - text2text-generation
datasets:
  - aihub
metrics:
  - bleu
  - rouge
model-index:
  - name: ko-TextNumbarT
    results:
      - task:
          type: text2text-generation
          name: text2text-generation
        metrics:
          - type: bleu
            value: 0.958234790096092
            name: eval_bleu
            verified: false
          - type: rouge1
            value: 0.9735361877162854
            name: eval_rouge1
            verified: false
          - type: rouge2
            value: 0.9493975212378124
            name: eval_rouge2
            verified: false
          - type: rougeL
            value: 0.9734558938864928
            name: eval_rougeL
            verified: false
          - type: rougeLsum
            value: 0.9734350757552404
            name: eval_rougeLsum
            verified: false

ko-TextNumbarT(TNT Model๐Ÿงจ): Try Korean Reading To Number(ํ•œ๊ธ€์„ ์ˆซ์ž๋กœ ๋ฐ”๊พธ๋Š” ๋ชจ๋ธ)

Table of Contents

Model Details

  • Model Description: ๋ญ”๊ฐ€ ์ฐพ์•„๋ด๋„ ๋ชจ๋ธ์ด๋‚˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋”ฑํžˆ ์—†์–ด์„œ ๋งŒ๋“ค์–ด๋ณธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
    BartForConditionalGeneration Fine-Tuning Model For Korean To Number
    BartForConditionalGeneration์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ, ํ•œ๊ธ€์„ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Task ์ž…๋‹ˆ๋‹ค.

  • Dataset use Korea aihub
    I can't open my fine-tuning datasets for my private issue
    ๋ฐ์ดํ„ฐ์…‹์€ Korea aihub์—์„œ ๋ฐ›์•„์„œ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ํŒŒ์ธํŠœ๋‹์— ์‚ฌ์šฉ๋œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์ •์ƒ ๊ณต๊ฐœํ•ด๋“œ๋ฆด ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค.

  • Korea aihub data is ONLY permit to Korean!!!!!!!
    aihub์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์œผ์‹ค ๋ถ„์€ ํ•œ๊ตญ์ธ์ผ ๊ฒƒ์ด๋ฏ€๋กœ, ํ•œ๊ธ€๋กœ๋งŒ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
    ์ •ํ™•ํžˆ๋Š” ์ฒ ์ž์ „์‚ฌ๋ฅผ ์Œ์„ฑ์ „์‚ฌ๋กœ ๋ฒˆ์—ญํ•˜๋Š” ํ˜•ํƒœ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. (ETRI ์ „์‚ฌ๊ธฐ์ค€)

  • In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets
    ์ฒœ๋งŒ์„ 1000๋งŒ ํ˜น์€ 10000000์œผ๋กœ ์“ธ ์ˆ˜๋„ ์žˆ๊ธฐ์—, Training Datasets์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๋Š” ์ƒ์ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ˆ˜๊ด€ํ˜•์‚ฌ์™€ ์ˆ˜ ์˜์กด๋ช…์‚ฌ์˜ ๋„์–ด์“ฐ๊ธฐ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ€ ํ™•์—ฐํžˆ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์‰ฐ์‚ด, ์‰ฐ ์‚ด -> ์‰ฐ์‚ด, 50์‚ด) https://eretz2.tistory.com/34
    ์ผ๋‹จ์€ ๊ธฐ์ค€์„ ์žก๊ณ  ์น˜์šฐ์น˜๊ฒŒ ํ•™์Šต์‹œํ‚ค๊ธฐ์—” ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋ ์ง€ ๋ชฐ๋ผ, ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋งก๊ธฐ๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. (์‰ฐ ์‚ด์ด ๋” ๋งŽ์„๊นŒ ์‰ฐ์‚ด์ด ๋” ๋งŽ์„๊นŒ!?)

  • Developed by: Yoo SungHyun(https://github.com/YooSungHyun)

  • Language(s): Korean

  • License: apache-2.0

  • Parent Model: See the kobart-base-v2 for more information about the pre-trained base model.

Uses

Want see more detail follow this URL KoGPT_num_converter
and see bart_inference.py and bart_train.py

Evaluation

Just using evaluate-metric/bleu and evaluate-metric/rouge in huggingface evaluate library
Training wanDB URL

How to Get Started With the Model

from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋Ÿฌ๊ฒŒ ๋ˆ„๊ฐ€ ์—ฌ์„ฏ์‹œ๊นŒ์ง€ ์ˆ ์„ ๋งˆ์‹œ๋ž˜?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-TextNumbarT")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-TextNumbarT")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
    "min_length": 0,
    "max_length": 1206,
    "num_beams": 100,
    "do_sample": False,
    "num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋Ÿฌ๊ฒŒ ๋ˆ„๊ฐ€ 6์‹œ๊นŒ์ง€ ์ˆ ์„ ๋งˆ์‹œ๋ž˜?