|
--- |
|
license: mit |
|
datasets: |
|
- Helsinki-NLP/tatoeba_mt |
|
language: |
|
- ja |
|
- ko |
|
pipeline_tag: translation |
|
tags: |
|
- python |
|
- transformer |
|
- pytorch |
|
--- |
|
# Japanese to Korean translator for FFXIV |
|
|
|
**FINAL FANTASY is a registered trademark of Square Enix Holdings Co., Ltd.** |
|
|
|
This project is detailed on the [Github repo](https://github.com/sappho192/ffxiv-ja-ko-translator). |
|
|
|
# Demo |
|
[![demo.gif](demo.gif)](https://huggingface.co/spaces/sappho192/ffxiv-ja-ko-translator-demo) |
|
[Click to try demo](https://huggingface.co/spaces/sappho192/ffxiv-ja-ko-translator-demo) |
|
|
|
# Usage |
|
|
|
Check the [test_eval.ipynb](https://huggingface.co/sappho192/ffxiv-ja-ko-translator/blob/main/test_eval.ipynb) or below section. |
|
|
|
## Inference |
|
|
|
```Python |
|
from transformers import( |
|
EncoderDecoderModel, |
|
PreTrainedTokenizerFast, |
|
BertJapaneseTokenizer, |
|
) |
|
|
|
import torch |
|
|
|
encoder_model_name = "cl-tohoku/bert-base-japanese-v2" |
|
decoder_model_name = "skt/kogpt2-base-v2" |
|
|
|
src_tokenizer = BertJapaneseTokenizer.from_pretrained(encoder_model_name) |
|
trg_tokenizer = PreTrainedTokenizerFast.from_pretrained(decoder_model_name) |
|
|
|
# You should change following `./best_model` to the path of model **directory** |
|
model = EncoderDecoderModel.from_pretrained("./best_model") |
|
|
|
text = "ギルガメッシュ討伐戦" |
|
# text = "ギルガメッシュ討伐戦に行ってきます。一緒に行きましょうか?" |
|
|
|
def translate(text_src): |
|
embeddings = src_tokenizer(text_src, return_attention_mask=False, return_token_type_ids=False, return_tensors='pt') |
|
embeddings = {k: v for k, v in embeddings.items()} |
|
output = model.generate(**embeddings)[0, 1:-1] |
|
text_trg = trg_tokenizer.decode(output.cpu()) |
|
return text_trg |
|
|
|
print(translate(text)) |
|
``` |
|
|
|
## Training |
|
|
|
Check the [training.ipynb](https://huggingface.co/sappho192/ffxiv-ja-ko-translator/blob/main/training.ipynb). |