opus-mt-eng-deu / README.md
teelinsan's picture
Add multilingual to the language tag (#1)
6ec5e83
---
language:
- en
- de
- multilingual
license: cc-by-4.0
tags:
- translation
- opus-mt
model-index:
- name: opus-mt-eng-deu
results:
- task:
type: translation
name: Translation eng-deu
dataset:
name: Tatoeba-test.eng-deu
type: tatoeba_mt
args: eng-deu
metrics:
- type: bleu
value: 45.8
name: BLEU
---
# Opus Tatoeba English-German
*This model was obtained by running the script [convert_marian_to_pytorch.py](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/convert_marian_to_pytorch.py) - [Instruction available here](https://github.com/huggingface/transformers/tree/main/scripts/tatoeba). The original models were trained by [J�rg Tiedemann](https://blogs.helsinki.fi/tiedeman/) using the [MarianNMT](https://marian-nmt.github.io/) library. See all available `MarianMTModel` models on the profile of the [Helsinki NLP](https://huggingface.co/Helsinki-NLP) group.
This is the conversion of checkpoint [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
*
---
### eng-deu
* source language name: English
* target language name: German
* OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/README.md)
* model: transformer
* source language code: en
* target language code: de
* dataset: opus
* release date: 2021-02-22
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
* download original weights: [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
* Training data:
* deu-eng: Tatoeba-train (86845165)
* Validation data:
* deu-eng: Tatoeba-dev, 284809
* total-size-shuffled: 284809
* devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
* Test data:
* newssyscomb2009.eng-deu: 502/11271
* news-test2008.eng-deu: 2051/47427
* newstest2009.eng-deu: 2525/62816
* newstest2010.eng-deu: 2489/61511
* newstest2011.eng-deu: 3003/72981
* newstest2012.eng-deu: 3003/72886
* newstest2013.eng-deu: 3000/63737
* newstest2014-deen.eng-deu: 3003/62964
* newstest2015-ende.eng-deu: 2169/44260
* newstest2016-ende.eng-deu: 2999/62670
* newstest2017-ende.eng-deu: 3004/61291
* newstest2018-ende.eng-deu: 2998/64276
* newstest2019-ende.eng-deu: 1997/48969
* Tatoeba-test.eng-deu: 10000/83347
* test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.test.txt)
* test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.eval.txt)
* BLEU-scores
|Test set|score|
|---|---|
|newstest2018-ende.eng-deu|46.4|
|Tatoeba-test.eng-deu|45.8|
|newstest2019-ende.eng-deu|42.4|
|newstest2016-ende.eng-deu|37.9|
|newstest2015-ende.eng-deu|32.0|
|newstest2017-ende.eng-deu|30.6|
|newstest2014-deen.eng-deu|29.6|
|newstest2013.eng-deu|27.6|
|newstest2010.eng-deu|25.9|
|news-test2008.eng-deu|23.9|
|newstest2012.eng-deu|23.8|
|newssyscomb2009.eng-deu|23.3|
|newstest2011.eng-deu|22.9|
|newstest2009.eng-deu|22.7|
* chr-F-scores
|Test set|score|
|---|---|
|newstest2018-ende.eng-deu|0.697|
|newstest2019-ende.eng-deu|0.664|
|Tatoeba-test.eng-deu|0.655|
|newstest2016-ende.eng-deu|0.644|
|newstest2015-ende.eng-deu|0.601|
|newstest2014-deen.eng-deu|0.595|
|newstest2017-ende.eng-deu|0.593|
|newstest2013.eng-deu|0.558|
|newstest2010.eng-deu|0.55|
|newssyscomb2009.eng-deu|0.539|
|news-test2008.eng-deu|0.533|
|newstest2009.eng-deu|0.533|
|newstest2012.eng-deu|0.53|
|newstest2011.eng-deu|0.528|