Fill-Mask
Transformers
PyTorch
xlm-roberta
Inference Endpoints
fenchri commited on
Commit
c1f1a7d
1 Parent(s): 3f1a185

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -46,10 +46,13 @@ language:
46
 
47
  # Model Card for EntityCS-39-MLM-xlmr-base
48
 
 
 
 
 
49
  This model has been trained on the EntityCS corpus, an English corpus from Wikipedia with replaced entities in different languages.
50
  The corpus can be found in [https://huggingface.co/huawei-noah/entity_cs](https://huggingface.co/huawei-noah/entity_cs), check the link for more details.
51
-
52
- Firstly, we employ the conventional 80-10-10 MLM objective, where 15% of sentence subwords are considered as masking candidates. From those, we replace subwords
53
  with [MASK] 80% of the time, with Random subwords (from the entire vocabulary) 10% of the time, and leave the remaining 10% unchanged (Same).
54
 
55
  To integrate entity-level cross-lingual knowledge into the model, we propose Entity Prediction objectives, where we only mask subwords belonging
@@ -82,14 +85,12 @@ This results into the following objectives: WEP + MLM, PEP<sub>MRS</sub> + MLM,
82
  This model was trained with the **MLM** objective on the EntityCS corpus with 39 languages.
83
 
84
 
85
- ## Model Details
86
-
87
- ### Training Details
88
 
89
  We start from the [XLM-R-base](https://huggingface.co/xlm-roberta-base) model and train for 1 epoch on 8 Nvidia V100 32GB GPUs.
90
  We set batch size to 16 and gradient accumulation steps to 2, resulting in an effective batch size of 256.
91
  For speedup we use fp16 mixed precision.
92
- We use the sampling strategy proposed by [Conneau and Lample (2019)](), where high resource languages are down-sampled and low
93
  resource languages get sampled more frequently.
94
  We only train the embedding and the last two layers of the model.
95
  We randomly choose 100 sentences from each language to serve as a validation set, on which we measure the perplexity every 10K training steps.
@@ -104,9 +105,12 @@ In the paper, we focused on entity-related tasks, such as NER, Word Sense Disamb
104
 
105
  Alternatively, it can be used directly (no fine-tuning) for probing tasks, i.e. predict missing words, such as [X-FACTR](https://aclanthology.org/2020.emnlp-main.479/).
106
 
 
 
 
107
  ## How to Get Started with the Model
108
 
109
- Use the code below to get started with the model: https://github.com/huawei-noah/noah-research/tree/master/NLP/EntityCS
110
 
111
  ## Citation
112
 
@@ -128,6 +132,8 @@ Use the code below to get started with the model: https://github.com/huawei-noah
128
  }
129
  ```
130
 
131
- ## Model Card Contact
132
 
133
- [Fenia Christopoulou](mailto:[email protected])
 
 
 
46
 
47
  # Model Card for EntityCS-39-MLM-xlmr-base
48
 
49
+ - Paper: https://aclanthology.org/2022.findings-emnlp.499.pdf
50
+ - Repository: https://github.com/huawei-noah/noah-research/tree/master/NLP/EntityCS
51
+ - Point of Contact: [Fenia Christopoulou](mailto:[email protected]), [Chenxi Whitehouse](mailto:[email protected])
52
+
53
  This model has been trained on the EntityCS corpus, an English corpus from Wikipedia with replaced entities in different languages.
54
  The corpus can be found in [https://huggingface.co/huawei-noah/entity_cs](https://huggingface.co/huawei-noah/entity_cs), check the link for more details.
55
+ To train models on the corpus, we first employ the conventional 80-10-10 MLM objective, where 15% of sentence subwords are considered as masking candidates. From those, we replace subwords
 
56
  with [MASK] 80% of the time, with Random subwords (from the entire vocabulary) 10% of the time, and leave the remaining 10% unchanged (Same).
57
 
58
  To integrate entity-level cross-lingual knowledge into the model, we propose Entity Prediction objectives, where we only mask subwords belonging
 
85
  This model was trained with the **MLM** objective on the EntityCS corpus with 39 languages.
86
 
87
 
88
+ ## Training Details
 
 
89
 
90
  We start from the [XLM-R-base](https://huggingface.co/xlm-roberta-base) model and train for 1 epoch on 8 Nvidia V100 32GB GPUs.
91
  We set batch size to 16 and gradient accumulation steps to 2, resulting in an effective batch size of 256.
92
  For speedup we use fp16 mixed precision.
93
+ We use the sampling strategy proposed by [Conneau and Lample (2019)](https://dl.acm.org/doi/pdf/10.5555/3454287.3454921), where high resource languages are down-sampled and low
94
  resource languages get sampled more frequently.
95
  We only train the embedding and the last two layers of the model.
96
  We randomly choose 100 sentences from each language to serve as a validation set, on which we measure the perplexity every 10K training steps.
 
105
 
106
  Alternatively, it can be used directly (no fine-tuning) for probing tasks, i.e. predict missing words, such as [X-FACTR](https://aclanthology.org/2020.emnlp-main.479/).
107
 
108
+ For results on each downstream task, please refer to the paper.
109
+
110
+
111
  ## How to Get Started with the Model
112
 
113
+ Use the code below to get started with the model: https://github.com/huawei-noah/noah-research/tree/master/NLP/EntityCS
114
 
115
  ## Citation
116
 
 
132
  }
133
  ```
134
 
135
+ **APA:**
136
 
137
+ ```html
138
+ Whitehouse, C., Christopoulou, F., & Iacobacci, I. (2022). EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 6698–6714). Association for Computational Linguistics.
139
+ ```