teelinsan commited on
Commit
a5e3259
1 Parent(s): 2480afd

init readme

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ tags:
6
+ - translation
7
+ - opus-mt
8
+ license: cc-by-4.0
9
+ model-index:
10
+ - name: opus-mt-eng-deu
11
+ results:
12
+ - task:
13
+ name: Translation eng-deu
14
+ type: translation
15
+ args: eng-deu
16
+ dataset:
17
+ name: tatoeba-test-v2021-02-22
18
+ type: tatoeba_mt
19
+ args: eng-deu
20
+ metrics:
21
+ - name: BLEU
22
+ type: bleu
23
+ value: 45.8
24
+ ---
25
+
26
+ # Opus Tatoeba English-German
27
+
28
+ *This model was obtained by running the script [convert_marian_to_pytorch.py](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/convert_marian_to_pytorch.py) - [Instruction available here](https://github.com/huggingface/transformers/tree/main/scripts/tatoeba). The original models were trained by [Jörg Tiedemann](https://blogs.helsinki.fi/tiedeman/) using the [MarianNMT](https://marian-nmt.github.io/) library. See all available `MarianMTModel` models on the profile of the [Helsinki NLP](https://huggingface.co/Helsinki-NLP) group.
29
+
30
+ This is the conversion of checkpoint [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
31
+ *
32
+
33
+
34
+ ---
35
+
36
+ ### eng-deu
37
+
38
+ * source language name: English
39
+ * target language name: German
40
+ * OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/README.md)
41
+
42
+ * model: transformer
43
+ * source language code: en
44
+ * target language code: de
45
+ * dataset: opus
46
+ * release date: 2021-02-22
47
+ * pre-processing: normalization + SentencePiece (spm32k,spm32k)
48
+ * download original weights: [opus-2021-02-22.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.zip)
49
+ * Training data:
50
+ * deu-eng: Tatoeba-train (86845165)
51
+ * Validation data:
52
+ * deu-eng: Tatoeba-dev, 284809
53
+ * total-size-shuffled: 284809
54
+ * devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
55
+ * Test data:
56
+ * newssyscomb2009.eng-deu: 502/11271
57
+ * news-test2008.eng-deu: 2051/47427
58
+ * newstest2009.eng-deu: 2525/62816
59
+ * newstest2010.eng-deu: 2489/61511
60
+ * newstest2011.eng-deu: 3003/72981
61
+ * newstest2012.eng-deu: 3003/72886
62
+ * newstest2013.eng-deu: 3000/63737
63
+ * newstest2014-deen.eng-deu: 3003/62964
64
+ * newstest2015-ende.eng-deu: 2169/44260
65
+ * newstest2016-ende.eng-deu: 2999/62670
66
+ * newstest2017-ende.eng-deu: 3004/61291
67
+ * newstest2018-ende.eng-deu: 2998/64276
68
+ * newstest2019-ende.eng-deu: 1997/48969
69
+ * Tatoeba-test.eng-deu: 10000/83347
70
+ * test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.test.txt)
71
+ * test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-deu/opus-2021-02-22.zip/eng-deu/opus-2021-02-22.eval.txt)
72
+ * BLEU-scores
73
+ |Test set|score|
74
+ |---|---|
75
+ |newstest2018-ende.eng-deu|46.4|
76
+ |Tatoeba-test.eng-deu|45.8|
77
+ |newstest2019-ende.eng-deu|42.4|
78
+ |newstest2016-ende.eng-deu|37.9|
79
+ |newstest2015-ende.eng-deu|32.0|
80
+ |newstest2017-ende.eng-deu|30.6|
81
+ |newstest2014-deen.eng-deu|29.6|
82
+ |newstest2013.eng-deu|27.6|
83
+ |newstest2010.eng-deu|25.9|
84
+ |news-test2008.eng-deu|23.9|
85
+ |newstest2012.eng-deu|23.8|
86
+ |newssyscomb2009.eng-deu|23.3|
87
+ |newstest2011.eng-deu|22.9|
88
+ |newstest2009.eng-deu|22.7|
89
+ * chr-F-scores
90
+ |Test set|score|
91
+ |---|---|
92
+ |newstest2018-ende.eng-deu|0.697|
93
+ |newstest2019-ende.eng-deu|0.664|
94
+ |Tatoeba-test.eng-deu|0.655|
95
+ |newstest2016-ende.eng-deu|0.644|
96
+ |newstest2015-ende.eng-deu|0.601|
97
+ |newstest2014-deen.eng-deu|0.595|
98
+ |newstest2017-ende.eng-deu|0.593|
99
+ |newstest2013.eng-deu|0.558|
100
+ |newstest2010.eng-deu|0.55|
101
+ |newssyscomb2009.eng-deu|0.539|
102
+ |news-test2008.eng-deu|0.533|
103
+ |newstest2009.eng-deu|0.533|
104
+ |newstest2012.eng-deu|0.53|
105
+ |newstest2011.eng-deu|0.528|