FacebookAI
/

xlm-mlm-ende-1024

Model card Files Files and versions

Marissa commited on Jul 6, 2022

Commit

5acb723

·

1 Parent(s): 8de635c

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -61,7 +61,11 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 # Training
-See the [associated paper](https://arxiv.org/pdf/1901.07291.pdf) for details on the training data and training procedure.
 The model developers also write that:

 # Training
+The model developers write:
+> In all experiments, we use a Transformer architecture with 1024 hidden units, 8 heads, GELU activations (Hendrycks and Gimpel, 2016), a dropout rate of 0.1 and learned positional embeddings. We train our models with the Adam op- timizer (Kingma and Ba, 2014), a linear warm- up (Vaswani et al., 2017) and learning rates varying from 10^−4 to 5.10^−4.
+See the [associated paper](https://arxiv.org/pdf/1901.07291.pdf) for links, citations, and further details on the training data and training procedure.
 The model developers also write that: