metadata
license: apache-2.0
language:
- ko
pipeline_tag: image-to-text
tags:
- trocr
- vision-encoder-decoder
trocr-small-korean
Model Details
TrOCR์ Encoder-Decoder ๋ชจ๋ธ๋ก, ์ด๋ฏธ์ง ํธ๋์คํฌ๋จธ ์ธ์ฝ๋์ ํ ์คํธ ํธ๋์คํฌ๋จธ ๋์ฝ๋๋ก ์ด๋ฃจ์ด์ ธ ์์ต๋๋ค. ์ด๋ฏธ์ง ์ธ์ฝ๋๋ DeiT ๊ฐ์ค์น๋ก ์ด๊ธฐํ๋์๊ณ , ํ ์คํธ ๋์ฝ๋๋ ์์ฒด์ ์ผ๋ก ํ์ตํ RoBERTa ๊ฐ์ค์น๋ก ์ด๊ธฐํ๋์์ต๋๋ค.
์ด ์ฐ๊ตฌ๋ ๊ตฌ๊ธ์ TPU Research Cloud(TRC)๋ฅผ ํตํด ์ง์๋ฐ์ Cloud TPU๋ก ํ์ต๋์์ต๋๋ค.
How to Get Started with the Model
import torch
from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("team-lucid/trocr-small-korean")
pixel_values = torch.rand(1, 3, 384, 384)
generated_ids = model.generate(pixel_values)
Training Details
Training Data
ํด๋น ๋ชจ๋ธ์ synthtiger๋ก ํฉ์ฑ๋ 6M๊ฐ์ ์ด๋ฏธ์ง๋ก ํ์ต๋์์ต๋๋ค
Training Hyperparameters
Hyperparameter | Small |
---|---|
Warmup Steps | 4,000 |
Learning Rates | 1e-4 |
Batch Size | 512 |
Weight Decay | 0.01 |
Max Steps | 500,000 |
Learning Rate Decay | 0.1 |
0.9 | |
0.98 |