File size: 3,653 Bytes
a5e3259
 
 
 
6ec5e83
 
a5e3259
 
 
 
 
 
 
 
6ec5e83
a5e3259
128f178
a5e3259
 
 
6ec5e83
a5e3259
6ec5e83
a5e3259
 
 
 
6ec5e83
a5e3259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
language:
- en
- de
- multilingual
license: cc-by-4.0
tags:
- translation
- opus-mt
model-index:
- name: opus-mt-eng-deu
  results:
  - task:
      type: translation
      name: Translation eng-deu
    dataset:
      name: Tatoeba-test.eng-deu
      type: tatoeba_mt
      args: eng-deu
    metrics:
    - type: bleu
      value: 45.8
      name: BLEU
---

# Opus Tatoeba English-German

*This model was obtained by running the script [convert_marian_to_pytorch.py](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/convert_marian_to_pytorch.py) - [Instruction available here](https://github.com/huggingface/transformers/tree/main/scripts/tatoeba). The original models were trained by [J�rg Tiedemann](https://blogs.helsinki.fi/tiedeman/) using the [MarianNMT](https://marian-nmt.github.io/) library. See all available `MarianMTModel` models on the profile of the [Helsinki NLP](https://huggingface.co/Helsinki-NLP) group.

This is the conversion of checkpoint [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
*


---

### eng-deu

* source language name: English
* target language name: German
* OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/README.md)

* model: transformer
* source language code: en
* target language code: de
* dataset: opus 
* release date: 2021-02-22
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
* download original weights: [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
* Training data: 
  * deu-eng: Tatoeba-train (86845165)
* Validation data: 
  * deu-eng: Tatoeba-dev, 284809
  * total-size-shuffled: 284809
  * devset-selected: top 5000  lines of Tatoeba-dev.src.shuffled!
* Test data: 
  * newssyscomb2009.eng-deu: 502/11271
  * news-test2008.eng-deu: 2051/47427
  * newstest2009.eng-deu: 2525/62816
  * newstest2010.eng-deu: 2489/61511
  * newstest2011.eng-deu: 3003/72981
  * newstest2012.eng-deu: 3003/72886
  * newstest2013.eng-deu: 3000/63737
  * newstest2014-deen.eng-deu: 3003/62964
  * newstest2015-ende.eng-deu: 2169/44260
  * newstest2016-ende.eng-deu: 2999/62670
  * newstest2017-ende.eng-deu: 3004/61291
  * newstest2018-ende.eng-deu: 2998/64276
  * newstest2019-ende.eng-deu: 1997/48969
  * Tatoeba-test.eng-deu: 10000/83347
* test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.test.txt)
* test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.eval.txt)
* BLEU-scores
|Test set|score|
|---|---|
|newstest2018-ende.eng-deu|46.4|
|Tatoeba-test.eng-deu|45.8|
|newstest2019-ende.eng-deu|42.4|
|newstest2016-ende.eng-deu|37.9|
|newstest2015-ende.eng-deu|32.0|
|newstest2017-ende.eng-deu|30.6|
|newstest2014-deen.eng-deu|29.6|
|newstest2013.eng-deu|27.6|
|newstest2010.eng-deu|25.9|
|news-test2008.eng-deu|23.9|
|newstest2012.eng-deu|23.8|
|newssyscomb2009.eng-deu|23.3|
|newstest2011.eng-deu|22.9|
|newstest2009.eng-deu|22.7|
* chr-F-scores
|Test set|score|
|---|---|
|newstest2018-ende.eng-deu|0.697|
|newstest2019-ende.eng-deu|0.664|
|Tatoeba-test.eng-deu|0.655|
|newstest2016-ende.eng-deu|0.644|
|newstest2015-ende.eng-deu|0.601|
|newstest2014-deen.eng-deu|0.595|
|newstest2017-ende.eng-deu|0.593|
|newstest2013.eng-deu|0.558|
|newstest2010.eng-deu|0.55|
|newssyscomb2009.eng-deu|0.539|
|news-test2008.eng-deu|0.533|
|newstest2009.eng-deu|0.533|
|newstest2012.eng-deu|0.53|
|newstest2011.eng-deu|0.528|