File size: 1,281 Bytes
fb4b5f2 97ecea1 fb4b5f2 97ecea1 46a0b74 ef18d1e 46a0b74 ef18d1e fb4b5f2 97ecea1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
datasets:
- Hoshikuzu/JESC
language:
- en
- ja
base_model:
- openai-community/gpt2
---
Made using Gpt-Small from scratch for learning purpose.
Tokenizer used is from Gemma 2-2B-JPN-IT which is trained on japanese dataset from JESC.
Model usage:-
```python
from transformers import AutoTokenizer,AutoModelForCausalLM
tokenizer=AutoTokenizer.from_pretrained('tirthadagr8/Japanese_to_english_gpt2CasualLM_GemmaTokenizer')
model=AutoModelForCausalLM.from_pretrained('tirthadagr8/Japanese_to_english_gpt2CasualLM_GemmaTokenizer')
model.cuda()
src_text='γγͺγγ¨γ―ιγ³γγγͺγ'
print(tokenizer.batch_decode(model.generate(tokenizer.encode(f"Translate the following Japanese sentence to English:\n\nJapanese:{src_text}\nEnglish:",return_tensors='pt')[:,:-1].cuda(),max_length=128))[0])
```
OUTPUT:
```
<bos>Translate the following Japanese sentence to English:
Japanese:γγͺγγ¨γ―ιγ³γγγͺγ
English:i don't want to play with you.<eos>
```
```bibtex
@ARTICLE{pryzant_jesc_2018,
author = {{Pryzant}, R. and {Chung}, Y. and {Jurafsky}, D. and {Britz}, D.},
title = "{JESC: Japanese-English Subtitle Corpus}",
journal = {Language Resources and Evaluation Conference (LREC)},
keywords = {Computer Science - Computation and Language},
year = 2018
} |