init readme
Browse files
README.md
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- de
|
5 |
+
tags:
|
6 |
+
- translation
|
7 |
+
- opus-mt
|
8 |
+
license: cc-by-4.0
|
9 |
+
model-index:
|
10 |
+
- name: opus-mt-eng-deu
|
11 |
+
results:
|
12 |
+
- task:
|
13 |
+
name: Translation eng-deu
|
14 |
+
type: translation
|
15 |
+
args: eng-deu
|
16 |
+
dataset:
|
17 |
+
name: tatoeba-test-v2021-02-22
|
18 |
+
type: tatoeba_mt
|
19 |
+
args: eng-deu
|
20 |
+
metrics:
|
21 |
+
- name: BLEU
|
22 |
+
type: bleu
|
23 |
+
value: 45.8
|
24 |
+
---
|
25 |
+
|
26 |
+
# Opus Tatoeba English-German
|
27 |
+
|
28 |
+
*This model was obtained by running the script [convert_marian_to_pytorch.py](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/convert_marian_to_pytorch.py) - [Instruction available here](https://github.com/huggingface/transformers/tree/main/scripts/tatoeba). The original models were trained by [Jörg Tiedemann](https://blogs.helsinki.fi/tiedeman/) using the [MarianNMT](https://marian-nmt.github.io/) library. See all available `MarianMTModel` models on the profile of the [Helsinki NLP](https://huggingface.co/Helsinki-NLP) group.
|
29 |
+
|
30 |
+
This is the conversion of checkpoint [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
|
31 |
+
*
|
32 |
+
|
33 |
+
|
34 |
+
---
|
35 |
+
|
36 |
+
### eng-deu
|
37 |
+
|
38 |
+
* source language name: English
|
39 |
+
* target language name: German
|
40 |
+
* OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/README.md)
|
41 |
+
|
42 |
+
* model: transformer
|
43 |
+
* source language code: en
|
44 |
+
* target language code: de
|
45 |
+
* dataset: opus
|
46 |
+
* release date: 2021-02-22
|
47 |
+
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
|
48 |
+
* download original weights: [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
|
49 |
+
* Training data:
|
50 |
+
* deu-eng: Tatoeba-train (86845165)
|
51 |
+
* Validation data:
|
52 |
+
* deu-eng: Tatoeba-dev, 284809
|
53 |
+
* total-size-shuffled: 284809
|
54 |
+
* devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
|
55 |
+
* Test data:
|
56 |
+
* newssyscomb2009.eng-deu: 502/11271
|
57 |
+
* news-test2008.eng-deu: 2051/47427
|
58 |
+
* newstest2009.eng-deu: 2525/62816
|
59 |
+
* newstest2010.eng-deu: 2489/61511
|
60 |
+
* newstest2011.eng-deu: 3003/72981
|
61 |
+
* newstest2012.eng-deu: 3003/72886
|
62 |
+
* newstest2013.eng-deu: 3000/63737
|
63 |
+
* newstest2014-deen.eng-deu: 3003/62964
|
64 |
+
* newstest2015-ende.eng-deu: 2169/44260
|
65 |
+
* newstest2016-ende.eng-deu: 2999/62670
|
66 |
+
* newstest2017-ende.eng-deu: 3004/61291
|
67 |
+
* newstest2018-ende.eng-deu: 2998/64276
|
68 |
+
* newstest2019-ende.eng-deu: 1997/48969
|
69 |
+
* Tatoeba-test.eng-deu: 10000/83347
|
70 |
+
* test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.test.txt)
|
71 |
+
* test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.eval.txt)
|
72 |
+
* BLEU-scores
|
73 |
+
|Test set|score|
|
74 |
+
|---|---|
|
75 |
+
|newstest2018-ende.eng-deu|46.4|
|
76 |
+
|Tatoeba-test.eng-deu|45.8|
|
77 |
+
|newstest2019-ende.eng-deu|42.4|
|
78 |
+
|newstest2016-ende.eng-deu|37.9|
|
79 |
+
|newstest2015-ende.eng-deu|32.0|
|
80 |
+
|newstest2017-ende.eng-deu|30.6|
|
81 |
+
|newstest2014-deen.eng-deu|29.6|
|
82 |
+
|newstest2013.eng-deu|27.6|
|
83 |
+
|newstest2010.eng-deu|25.9|
|
84 |
+
|news-test2008.eng-deu|23.9|
|
85 |
+
|newstest2012.eng-deu|23.8|
|
86 |
+
|newssyscomb2009.eng-deu|23.3|
|
87 |
+
|newstest2011.eng-deu|22.9|
|
88 |
+
|newstest2009.eng-deu|22.7|
|
89 |
+
* chr-F-scores
|
90 |
+
|Test set|score|
|
91 |
+
|---|---|
|
92 |
+
|newstest2018-ende.eng-deu|0.697|
|
93 |
+
|newstest2019-ende.eng-deu|0.664|
|
94 |
+
|Tatoeba-test.eng-deu|0.655|
|
95 |
+
|newstest2016-ende.eng-deu|0.644|
|
96 |
+
|newstest2015-ende.eng-deu|0.601|
|
97 |
+
|newstest2014-deen.eng-deu|0.595|
|
98 |
+
|newstest2017-ende.eng-deu|0.593|
|
99 |
+
|newstest2013.eng-deu|0.558|
|
100 |
+
|newstest2010.eng-deu|0.55|
|
101 |
+
|newssyscomb2009.eng-deu|0.539|
|
102 |
+
|news-test2008.eng-deu|0.533|
|
103 |
+
|newstest2009.eng-deu|0.533|
|
104 |
+
|newstest2012.eng-deu|0.53|
|
105 |
+
|newstest2011.eng-deu|0.528|
|