Update README.md
Browse files
README.md
CHANGED
@@ -29,11 +29,11 @@ Here are scores on the devlopment set of six Chinese tasks:
|
|
29 |
|
30 |
|Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
|
31 |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
32 |
-
|
|
33 |
-
|
|
34 |
-
|
|
35 |
-
|
|
36 |
-
|
|
37 |
|
38 |
For each task, we selected the best fine-tuning hyperparameters from the lists below:
|
39 |
- epochs: 3, 5, 8
|
@@ -42,7 +42,7 @@ For each task, we selected the best fine-tuning hyperparameters from the lists b
|
|
42 |
|
43 |
## How to use
|
44 |
|
45 |
-
You can use this model directly with a pipeline for masked language modeling (take the case of
|
46 |
|
47 |
```python
|
48 |
>>> from transformers import pipeline
|
@@ -102,8 +102,9 @@ CLUECorpusSmall is used as training data. We found that models pre-trained on CL
|
|
102 |
|
103 |
Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
|
104 |
|
105 |
-
Taking the case of
|
106 |
-
|
|
|
107 |
```
|
108 |
python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
109 |
--vocab_path models/google_zh_vocab.txt \
|
@@ -121,7 +122,7 @@ python3 pretrain.py --dataset_path cluecorpussmall_seq128_dataset.pt \
|
|
121 |
--learning_rate 1e-4 --batch_size 64 \
|
122 |
--tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
|
123 |
```
|
124 |
-
Stage2
|
125 |
```
|
126 |
python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
127 |
--vocab_path models/google_zh_vocab.txt \
|
|
|
29 |
|
30 |
|Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
|
31 |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
32 |
+
|RoBERTa-Tiny|72.3|83.0|91.4|81.8|62.0|55.0|60.3|
|
33 |
+
|RoBERTa-Mini|75.7|84.8|93.7|86.1|63.9|58.3|67.4|
|
34 |
+
|RoBERTa-Small|76.8|86.5|93.4|86.5|65.1|59.4|69.7|
|
35 |
+
|RoBERTa-Medium|77.8|87.6|94.8|88.1|65.6|59.5|71.2|
|
36 |
+
|RoBERTa-Base|79.5|89.1|95.2|89.2|67.0|60.9|75.5|
|
37 |
|
38 |
For each task, we selected the best fine-tuning hyperparameters from the lists below:
|
39 |
- epochs: 3, 5, 8
|
|
|
42 |
|
43 |
## How to use
|
44 |
|
45 |
+
You can use this model directly with a pipeline for masked language modeling (take the case of RoBERTa-Medium):
|
46 |
|
47 |
```python
|
48 |
>>> from transformers import pipeline
|
|
|
102 |
|
103 |
Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
|
104 |
|
105 |
+
Taking the case of RoBERTa-Medium
|
106 |
+
|
107 |
+
Stage1:
|
108 |
```
|
109 |
python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
110 |
--vocab_path models/google_zh_vocab.txt \
|
|
|
122 |
--learning_rate 1e-4 --batch_size 64 \
|
123 |
--tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
|
124 |
```
|
125 |
+
Stage2:
|
126 |
```
|
127 |
python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
|
128 |
--vocab_path models/google_zh_vocab.txt \
|