Update README.md
Browse files
README.md
CHANGED
@@ -45,6 +45,21 @@ This is the mLUKE large model with 24 hidden layers, 768 hidden size. The total
|
|
45 |
of parameters in this model is 868M (561M for the word embeddings and encoder, 307M for the entity embeddings).
|
46 |
The model was initialized with the weights of XLM-RoBERTa(large) and trained using December 2020 version of Wikipedia in 24 languages.
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
### Citation
|
49 |
|
50 |
If you find mLUKE useful for your work, please cite the following paper:
|
|
|
45 |
of parameters in this model is 868M (561M for the word embeddings and encoder, 307M for the entity embeddings).
|
46 |
The model was initialized with the weights of XLM-RoBERTa(large) and trained using December 2020 version of Wikipedia in 24 languages.
|
47 |
|
48 |
+
## Note
|
49 |
+
When you load the model from `AutoModel.from_pretrained` with the default configuration, you will see the following warning:
|
50 |
+
|
51 |
+
```
|
52 |
+
Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
|
53 |
+
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias',
|
54 |
+
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias',
|
55 |
+
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias',
|
56 |
+
...]
|
57 |
+
```
|
58 |
+
|
59 |
+
These weights are the weights for entity-aware attention (as described in [the LUKE paper](https://arxiv.org/abs/2010.01057)).
|
60 |
+
This is expected because `use_entity_aware_attention` is set to `false` by default, but the pretrained weights contain the weights for it in case you enable `use_entity_aware_attention` and have the weights loaded into the model.
|
61 |
+
|
62 |
+
|
63 |
### Citation
|
64 |
|
65 |
If you find mLUKE useful for your work, please cite the following paper:
|