tiedeman Ezi commited on
Commit
86b3ded
1 Parent(s): 6c00b32

Model Card (#1)

Browse files

- Model Card (0161e478c48dbb4dd22fb4a03a04816533fb56f2)


Co-authored-by: Ezi Ozoani <[email protected]>

Files changed (1) hide show
  1. README.md +78 -6
README.md CHANGED
@@ -6,18 +6,63 @@ license: apache-2.0
6
 
7
  ### opus-mt-en-de
8
 
9
- * source languages: en
10
- * target languages: de
11
- * OPUS readme: [en-de](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-de/README.md)
12
 
13
- * dataset: opus
14
- * model: transformer-align
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  * pre-processing: normalization + SentencePiece
 
 
16
  * download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.zip)
 
17
  * test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.test.txt)
 
 
 
 
 
18
  * test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.eval.txt)
19
 
20
- ## Benchmarks
 
21
 
22
  | testset | BLEU | chr-F |
23
  |-----------------------|-------|-------|
@@ -35,3 +80,30 @@ license: apache-2.0
35
  | newstest2019-ende.en.de | 40.9 | 0.654 |
36
  | Tatoeba.en.de | 47.3 | 0.664 |
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ### opus-mt-en-de
8
 
 
 
 
9
 
10
+ ## Table of Contents
11
+ - [Model Details](#model-details)
12
+ - [Uses](#uses)
13
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
14
+ - [Training](#training)
15
+ - [Evaluation](#evaluation)
16
+ - [Citation Information](#citation-information)
17
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
18
+
19
+ ## Model Details
20
+ **Model Description:**
21
+ - **Developed by:** Language Technology Research Group at the University of Helsinki
22
+ - **Model Type:** Translation
23
+ - **Language(s):**
24
+ - Source Language: English
25
+ - Target Language: German
26
+ - **License:** Apache-2.0
27
+ - **Resources for more information:**
28
+ - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)
29
+
30
+
31
+ ## Uses
32
+
33
+ #### Direct Use
34
+
35
+ This model can be used for translation and text-to-text generation.
36
+
37
+
38
+ ## Risks, Limitations and Biases
39
+
40
+
41
+
42
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
43
+
44
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
45
+
46
+ Further details about the dataset for this model can be found in the OPUS readme: [en-de](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-de/README.md)
47
+
48
+
49
+ #### Training Data
50
+ ##### Preprocessing
51
  * pre-processing: normalization + SentencePiece
52
+
53
+ * dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)
54
  * download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.zip)
55
+
56
  * test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.test.txt)
57
+
58
+ ## Evaluation
59
+
60
+ #### Results
61
+
62
  * test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.eval.txt)
63
 
64
+
65
+ #### Benchmarks
66
 
67
  | testset | BLEU | chr-F |
68
  |-----------------------|-------|-------|
 
80
  | newstest2019-ende.en.de | 40.9 | 0.654 |
81
  | Tatoeba.en.de | 47.3 | 0.664 |
82
 
83
+
84
+
85
+ ## Citation Information
86
+
87
+ ```bibtex
88
+ @InProceedings{TiedemannThottingal:EAMT2020,
89
+ author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
90
+ title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld},
91
+ booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)},
92
+ year = {2020},
93
+ address = {Lisbon, Portugal}
94
+ }
95
+ ```
96
+
97
+ ## How to Get Started With the Model
98
+ ```python
99
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
100
+
101
+ tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
102
+
103
+ model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de")
104
+
105
+ ```
106
+
107
+
108
+
109
+