Typo corrections
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ datasets:
|
|
5 |
- wikipedia
|
6 |
---
|
7 |
|
8 |
-
#
|
9 |
|
10 |
Pretrained model on French language using a masked language modeling (MLM) objective. It was introduced in
|
11 |
[this paper](https://arxiv.org/abs/1909.11942) and first released in
|
@@ -14,7 +14,7 @@ between french and French.
|
|
14 |
|
15 |
## Model description
|
16 |
|
17 |
-
|
18 |
was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
|
19 |
publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
|
20 |
was pretrained with two objectives:
|
@@ -24,13 +24,13 @@ was pretrained with two objectives:
|
|
24 |
recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
|
25 |
GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
|
26 |
sentence.
|
27 |
-
- Sentence Ordering Prediction (SOP):
|
28 |
|
29 |
This way, the model learns an inner representation of the English language that can then be used to extract features
|
30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
31 |
-
classifier using the features produced by the
|
32 |
|
33 |
-
|
34 |
|
35 |
This is the first version of the base model.
|
36 |
|
@@ -87,7 +87,7 @@ output = model(encoded_input)
|
|
87 |
|
88 |
## Training data
|
89 |
|
90 |
-
The
|
91 |
headers).
|
92 |
|
93 |
## Training procedure
|
@@ -103,7 +103,7 @@ then of the form:
|
|
103 |
|
104 |
### Training
|
105 |
|
106 |
-
The
|
107 |
|
108 |
The details of the masking procedure for each sentence are the following:
|
109 |
- 15% of the tokens are masked.
|
|
|
5 |
- wikipedia
|
6 |
---
|
7 |
|
8 |
+
# FrALBERT Base
|
9 |
|
10 |
Pretrained model on French language using a masked language modeling (MLM) objective. It was introduced in
|
11 |
[this paper](https://arxiv.org/abs/1909.11942) and first released in
|
|
|
14 |
|
15 |
## Model description
|
16 |
|
17 |
+
FrALBERT is a transformers model pretrained on 4Go of French Wikipedia in a self-supervised fashion. This means it
|
18 |
was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
|
19 |
publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
|
20 |
was pretrained with two objectives:
|
|
|
24 |
recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
|
25 |
GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
|
26 |
sentence.
|
27 |
+
- Sentence Ordering Prediction (SOP): FrALBERT uses a pretraining loss based on predicting the ordering of two consecutive segments of text.
|
28 |
|
29 |
This way, the model learns an inner representation of the English language that can then be used to extract features
|
30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
31 |
+
classifier using the features produced by the FrALBERT model as inputs.
|
32 |
|
33 |
+
FrALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
|
34 |
|
35 |
This is the first version of the base model.
|
36 |
|
|
|
87 |
|
88 |
## Training data
|
89 |
|
90 |
+
The FrALBERT model was pretrained on 4go of [French Wikipedia](https://fr.wikipedia.org/wiki/French_Wikipedia) (excluding lists, tables and
|
91 |
headers).
|
92 |
|
93 |
## Training procedure
|
|
|
103 |
|
104 |
### Training
|
105 |
|
106 |
+
The FrALBERT procedure follows the BERT setup.
|
107 |
|
108 |
The details of the masking procedure for each sentence are the following:
|
109 |
- 15% of the tokens are masked.
|