File size: 2,297 Bytes
cbe3ca4
 
d4e1eb5
 
 
 
 
 
 
 
 
 
cbe3ca4
 
d4e1eb5
cbe3ca4
d4e1eb5
 
 
cbe3ca4
 
 
d4e1eb5
cbe3ca4
d4e1eb5
 
 
 
 
 
 
cbe3ca4
d4e1eb5
cbe3ca4
d4e1eb5
cbe3ca4
d4e1eb5
 
cbe3ca4
d4e1eb5
cbe3ca4
d4e1eb5
 
cbe3ca4
 
 
d4e1eb5
 
 
 
 
 
cbe3ca4
 
d4e1eb5
cbe3ca4
d4e1eb5
 
cbe3ca4
d4e1eb5
 
cbe3ca4
d4e1eb5
 
 
 
cbe3ca4
d4e1eb5
 
 
 
 
 
 
 
cbe3ca4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
library_name: transformers
license: cc-by-nc-4.0
datasets:
- allenai/nllb
- facebook/flores
language:
- ko
- en
metrics:
- chrf
pipeline_tag: translation
---

# NLLB-200 Distilled-350M_en2ko

The NLLB-200 model showed outstanding performance in translation task and contributed to solving problems with low-resource languages.
Despite their efforts, it is still hard to run 600M or more than 1B model for those who have not enough computing environment.
So I made much smaller model that expertized translaing English to Korean. you can also run it with cpu (No mixed-precision, No Quantization). 



## Model

- Model: model is based on NLLB-200 600M
  - **Parameters: 350,537,728 (350M)**
  - **Encoder layers: 12 -> 3**
  - **Decoder layers: 12 -> 3**
  - FFN dimension: 4096 (same)
  - Embed dimension: 1024 (same)
  - Vocab size: 256206 (same)

- Licnese: CC-BY-NC

## Data

- Training Data: [NLLB dataset](https://huggingface.co/datasets/allenai/nllb)
- Evaluation Data: [Flores-200 dataset](https://huggingface.co/datasets/facebook/flores)

## Metric

- CPU: Intel (R) Xeon(R) CPU @ 2.20GHz (16 cores)
- GPU: NVIDIA L4 24GB



|                        | #Params | chrF(++) | GPU Inference time (s) | CPU Inference time (s) |
| ---------------------- | ------- | -------- | ---------------------- | ---------------------- |
| NLLB-200 3.3B          | 3.3B    | 34.3     | 0.98 s                 | 4.65 s                 |
| NLLB-200 1.3B          | 1.3B    | 32.1     | 0.89 s                 | 2.46 s                 |
| NLLB-200 600M          | 600M    | 32       | 0.43 s                 | 1.52 s                 |
| NLLB-200 350M (*ours*) | 350M    | 24.6     | 0.24 s                 | 1.43 s                 |


## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained('dhtocks/nllb-200-distilled-350M_en-ko', forced_bos_token_id=256098)
tokenizer = AutoTokenizer.from_pretrained('dhtocks/nllb-200-distilled-350M_en-ko', src_lang='eng_Latn', tgt_lang='kor_Hang')

inputs = tokenizer('[YOUR_INPUT]', return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0]))
```

## Citation
```bibtex
@misc{,
  title={NLLB-200 distilled_350M_en-ko},
  author={Saechan Oh},
  year={2024}
}
```