|
--- |
|
language: |
|
- en |
|
- de |
|
- multilingual |
|
license: cc-by-4.0 |
|
tags: |
|
- translation |
|
- opus-mt |
|
model-index: |
|
- name: opus-mt-eng-deu |
|
results: |
|
- task: |
|
type: translation |
|
name: Translation eng-deu |
|
dataset: |
|
name: Tatoeba-test.eng-deu |
|
type: tatoeba_mt |
|
args: eng-deu |
|
metrics: |
|
- type: bleu |
|
value: 45.8 |
|
name: BLEU |
|
--- |
|
|
|
# Opus Tatoeba English-German |
|
|
|
*This model was obtained by running the script [convert_marian_to_pytorch.py](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/convert_marian_to_pytorch.py) - [Instruction available here](https://github.com/huggingface/transformers/tree/main/scripts/tatoeba). The original models were trained by [J�rg Tiedemann](https://blogs.helsinki.fi/tiedeman/) using the [MarianNMT](https://marian-nmt.github.io/) library. See all available `MarianMTModel` models on the profile of the [Helsinki NLP](https://huggingface.co/Helsinki-NLP) group. |
|
|
|
This is the conversion of checkpoint [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip) |
|
* |
|
|
|
|
|
--- |
|
|
|
### eng-deu |
|
|
|
* source language name: English |
|
* target language name: German |
|
* OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/README.md) |
|
|
|
* model: transformer |
|
* source language code: en |
|
* target language code: de |
|
* dataset: opus |
|
* release date: 2021-02-22 |
|
* pre-processing: normalization + SentencePiece (spm32k,spm32k) |
|
* download original weights: [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip) |
|
* Training data: |
|
* deu-eng: Tatoeba-train (86845165) |
|
* Validation data: |
|
* deu-eng: Tatoeba-dev, 284809 |
|
* total-size-shuffled: 284809 |
|
* devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled! |
|
* Test data: |
|
* newssyscomb2009.eng-deu: 502/11271 |
|
* news-test2008.eng-deu: 2051/47427 |
|
* newstest2009.eng-deu: 2525/62816 |
|
* newstest2010.eng-deu: 2489/61511 |
|
* newstest2011.eng-deu: 3003/72981 |
|
* newstest2012.eng-deu: 3003/72886 |
|
* newstest2013.eng-deu: 3000/63737 |
|
* newstest2014-deen.eng-deu: 3003/62964 |
|
* newstest2015-ende.eng-deu: 2169/44260 |
|
* newstest2016-ende.eng-deu: 2999/62670 |
|
* newstest2017-ende.eng-deu: 3004/61291 |
|
* newstest2018-ende.eng-deu: 2998/64276 |
|
* newstest2019-ende.eng-deu: 1997/48969 |
|
* Tatoeba-test.eng-deu: 10000/83347 |
|
* test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.test.txt) |
|
* test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.eval.txt) |
|
* BLEU-scores |
|
|Test set|score| |
|
|---|---| |
|
|newstest2018-ende.eng-deu|46.4| |
|
|Tatoeba-test.eng-deu|45.8| |
|
|newstest2019-ende.eng-deu|42.4| |
|
|newstest2016-ende.eng-deu|37.9| |
|
|newstest2015-ende.eng-deu|32.0| |
|
|newstest2017-ende.eng-deu|30.6| |
|
|newstest2014-deen.eng-deu|29.6| |
|
|newstest2013.eng-deu|27.6| |
|
|newstest2010.eng-deu|25.9| |
|
|news-test2008.eng-deu|23.9| |
|
|newstest2012.eng-deu|23.8| |
|
|newssyscomb2009.eng-deu|23.3| |
|
|newstest2011.eng-deu|22.9| |
|
|newstest2009.eng-deu|22.7| |
|
* chr-F-scores |
|
|Test set|score| |
|
|---|---| |
|
|newstest2018-ende.eng-deu|0.697| |
|
|newstest2019-ende.eng-deu|0.664| |
|
|Tatoeba-test.eng-deu|0.655| |
|
|newstest2016-ende.eng-deu|0.644| |
|
|newstest2015-ende.eng-deu|0.601| |
|
|newstest2014-deen.eng-deu|0.595| |
|
|newstest2017-ende.eng-deu|0.593| |
|
|newstest2013.eng-deu|0.558| |
|
|newstest2010.eng-deu|0.55| |
|
|newssyscomb2009.eng-deu|0.539| |
|
|news-test2008.eng-deu|0.533| |
|
|newstest2009.eng-deu|0.533| |
|
|newstest2012.eng-deu|0.53| |
|
|newstest2011.eng-deu|0.528| |