Update README.md
Browse files
README.md
CHANGED
@@ -228,6 +228,11 @@ Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarle
|
|
228 |
Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
|
229 |
Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
|
230 |
|
|
|
|
|
|
|
|
|
|
|
231 |
## Generating with NLLB-MoE
|
232 |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
|
233 |
|
|
|
228 |
Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
|
229 |
Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
|
230 |
|
231 |
+
## Training:
|
232 |
+
|
233 |
+
- The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
|
234 |
+
![EOM](https://drive.google.com/uc?id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl/view?usp=sharing)
|
235 |
+
|
236 |
## Generating with NLLB-MoE
|
237 |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
|
238 |
|