Pretrained BART in Korean

This is pretrained BART model with multiple Korean Datasets.

I used multiple datasets for generalizing the model for both colloquial and written texts.

The training is supported by TPU Research Cloud program.

The script which is used to pre-train model is here.

When you use the reference API, you must wrap the sentence with [BOS] and [EOS] like below example.

[BOS] ์•ˆ๋…•ํ•˜์„ธ์š”? ๋ฐ˜๊ฐ€์›Œ์š”~~ [EOS]

You can also test mask filling performance using [MASK] token like this.

[BOS] [MASK] ๋จน์—ˆ์–ด? [EOS]

Benchmark

Dataset KLUE NLI dev NSMC test QuestionPair test KLUE TC dev KLUE STS dev KorSTS dev HateSpeech dev
Metric Acc Acc Acc Acc F1 F1 Pearson Spearman F1 Pearson Spearman Bias Acc Hate Acc
Score 0.5253 0.8425 0.8945 0.8047 0.7988 0.7411 0.7471 0.7399 0.7725 0.6503 0.6191 0.7537 0.5605

Used Datasets

๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜

  • ์ผ์ƒ ๋Œ€ํ™” ๋ง๋ญ‰์น˜ 2020
  • ๊ตฌ์–ด ๋ง๋ญ‰์น˜
  • ๋ฌธ์–ด ๋ง๋ญ‰์น˜
  • ์‹ ๋ฌธ ๋ง๋ญ‰์น˜

AIhub

์„ธ์ข… ๋ง๋ญ‰์น˜

Downloads last month
26
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.