|
--- |
|
license: apache-2.0 |
|
language: |
|
- ko |
|
pipeline_tag: image-to-text |
|
tags: |
|
- trocr |
|
- vision-encoder-decoder |
|
--- |
|
|
|
# trocr-small-korean |
|
|
|
## Model Details |
|
|
|
TrOCR์ Encoder-Decoder ๋ชจ๋ธ๋ก, ์ด๋ฏธ์ง ํธ๋์คํฌ๋จธ ์ธ์ฝ๋์ ํ
์คํธ ํธ๋์คํฌ๋จธ ๋์ฝ๋๋ก ์ด๋ฃจ์ด์ ธ ์์ต๋๋ค. |
|
์ด๋ฏธ์ง ์ธ์ฝ๋๋ DeiT ๊ฐ์ค์น๋ก ์ด๊ธฐํ๋์๊ณ , ํ
์คํธ ๋์ฝ๋๋ ์์ฒด์ ์ผ๋ก ํ์ตํ RoBERTa ๊ฐ์ค์น๋ก ์ด๊ธฐํ๋์์ต๋๋ค. |
|
|
|
์ด ์ฐ๊ตฌ๋ ๊ตฌ๊ธ์ TPU Research Cloud(TRC)๋ฅผ ํตํด ์ง์๋ฐ์ Cloud TPU๋ก ํ์ต๋์์ต๋๋ค. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
import torch |
|
|
|
from transformers import VisionEncoderDecoderModel |
|
|
|
model = VisionEncoderDecoderModel.from_pretrained("team-lucid/trocr-small-korean") |
|
|
|
pixel_values = torch.rand(1, 3, 384, 384) |
|
generated_ids = model.generate(pixel_values) |
|
``` |
|
|
|
## Training Details |
|
### Training Data |
|
|
|
ํด๋น ๋ชจ๋ธ์ [synthtiger](https://github.com/clovaai/synthtiger)๋ก ํฉ์ฑ๋ 6M๊ฐ์ ์ด๋ฏธ์ง๋ก ํ์ต๋์์ต๋๋ค |
|
|
|
### Training Hyperparameters |
|
|
|
| Hyperparameter | Small | |
|
|:--------------------|--------:| |
|
| Warmup Steps | 4,000 | |
|
| Learning Rates | 1e-4 | |
|
| Batch Size | 512 | |
|
| Weight Decay | 0.01 | |
|
| Max Steps | 500,000 | |
|
| Learning Rate Decay | 0.1 | |
|
| \\(Adam\beta_1\\) | 0.9 | |
|
| \\(Adam\beta_2\\) | 0.98 | |
|
|