Dani
commited on
Commit
•
255b995
1
Parent(s):
d2f8e65
Update readme and config
Browse files- README.md +31 -0
- config.json +1 -1
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: spanish
|
3 |
+
license: apache-2.0
|
4 |
+
datasets:
|
5 |
+
- wikipedia
|
6 |
+
widget:
|
7 |
+
- text: "El español es un idioma muy [MASK] en el mundo."
|
8 |
+
---
|
9 |
+
|
10 |
+
# DistilBERT base multilingual model Spanish subset (cased)
|
11 |
+
|
12 |
+
This model is the Spanish extract of `distilbert-base-multilingual-cased`, a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). It uses the extraction method proposed by Geotrend, which is described in https://github.com/Geotrend-research/smaller-transformers.
|
13 |
+
|
14 |
+
In particular, we've ran the following script:
|
15 |
+
|
16 |
+
```sh
|
17 |
+
python reduce_model.py \
|
18 |
+
--source_model distilbert-base-multilingual-cased \
|
19 |
+
--vocab_file notebooks/selected_tokens/selected_es_tokens.txt \
|
20 |
+
--output_model distilbert-base-es-multilingual-cased \
|
21 |
+
--convert_to_tf False
|
22 |
+
```
|
23 |
+
|
24 |
+
The resulting model has the same architecture as DistilmBERT: 6 layers, 768 dimension and 12 heads, with a total of **65M parameters** (compared to 134M parameters for DistilmBERT).
|
25 |
+
|
26 |
+
The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.
|
27 |
+
|
28 |
+
|
29 |
+
|
30 |
+
|
31 |
+
|
config.json
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
{
|
2 |
"activation": "gelu",
|
3 |
"architectures": [
|
4 |
-
"
|
5 |
],
|
6 |
"attention_dropout": 0.1,
|
7 |
"dim": 768,
|
|
|
1 |
{
|
2 |
"activation": "gelu",
|
3 |
"architectures": [
|
4 |
+
"DistilBertForMaskedLM"
|
5 |
],
|
6 |
"attention_dropout": 0.1,
|
7 |
"dim": 768,
|