|
--- |
|
library_name: transformers |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- allenai/nllb |
|
- facebook/flores |
|
language: |
|
- ko |
|
- en |
|
metrics: |
|
- chrf |
|
pipeline_tag: translation |
|
--- |
|
|
|
# NLLB-200 Distilled-350M_en2ko |
|
|
|
The NLLB-200 model showed outstanding performance in translation task and contributed to solving problems with low-resource languages. |
|
Despite their efforts, it is still hard to run 600M or more than 1B model for those who have not enough computing environment. |
|
So I made much smaller model that expertized translaing English to Korean. you can also run it with cpu (No mixed-precision, No Quantization). |
|
|
|
|
|
|
|
## Model |
|
|
|
- Model: model is based on NLLB-200 600M |
|
- **Parameters: 350,537,728 (350M)** |
|
- **Encoder layers: 12 -> 3** |
|
- **Decoder layers: 12 -> 3** |
|
- FFN dimension: 4096 (same) |
|
- Embed dimension: 1024 (same) |
|
- Vocab size: 256206 (same) |
|
|
|
- Licnese: CC-BY-NC |
|
|
|
## Data |
|
|
|
- Training Data: [NLLB dataset](https://huggingface.co/datasets/allenai/nllb) |
|
- Evaluation Data: [Flores-200 dataset](https://huggingface.co/datasets/facebook/flores) |
|
|
|
## Metric |
|
|
|
- CPU: Intel (R) Xeon(R) CPU @ 2.20GHz (16 cores) |
|
- GPU: NVIDIA L4 24GB |
|
|
|
|
|
|
|
| | #Params | chrF(++) | GPU Inference time (s) | CPU Inference time (s) | |
|
| ---------------------- | ------- | -------- | ---------------------- | ---------------------- | |
|
| NLLB-200 3.3B | 3.3B | 34.3 | 0.98 s | 4.65 s | |
|
| NLLB-200 1.3B | 1.3B | 32.1 | 0.89 s | 2.46 s | |
|
| NLLB-200 600M | 600M | 32 | 0.43 s | 1.52 s | |
|
| NLLB-200 350M (*ours*) | 350M | 24.6 | 0.24 s | 1.43 s | |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained('dhtocks/nllb-200-distilled-350M_en-ko', forced_bos_token_id=256098) |
|
tokenizer = AutoTokenizer.from_pretrained('dhtocks/nllb-200-distilled-350M_en-ko', src_lang='eng_Latn', tgt_lang='kor_Hang') |
|
|
|
inputs = tokenizer('[YOUR_INPUT]', return_tensors="pt") |
|
output = model.generate(**inputs) |
|
print(tokenizer.decode(output[0])) |
|
``` |
|
|
|
## Citation |
|
```bibtex |
|
@misc{, |
|
title={NLLB-200 distilled_350M_en-ko}, |
|
author={Saechan Oh}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|