uer commited on
Commit
1eb83a7
·
1 Parent(s): fde00b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -29,11 +29,11 @@ Here are scores on the devlopment set of six Chinese tasks:
29
 
30
  |Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
31
  |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
32
- |BERT-Tiny|72.3|83.0|91.4|81.8|62.0|55.0|60.3|
33
- |BERT-Mini|75.7|84.8|93.7|86.1|63.9|58.3|67.4|
34
- |BERT-Small|76.8|86.5|93.4|86.5|65.1|59.4|69.7|
35
- |BERT-Medium|77.8|87.6|94.8|88.1|65.6|59.5|71.2|
36
- |BERT-Base|79.5|89.1|95.2|89.2|67.0|60.9|75.5|
37
 
38
  For each task, we selected the best fine-tuning hyperparameters from the lists below:
39
  - epochs: 3, 5, 8
@@ -42,7 +42,7 @@ For each task, we selected the best fine-tuning hyperparameters from the lists b
42
 
43
  ## How to use
44
 
45
- You can use this model directly with a pipeline for masked language modeling (take the case of BERT-Medium):
46
 
47
  ```python
48
  >>> from transformers import pipeline
@@ -102,8 +102,9 @@ CLUECorpusSmall is used as training data. We found that models pre-trained on CL
102
 
103
  Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
104
 
105
- Taking the case of BERT-Medium:
106
- Stage1
 
107
  ```
108
  python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
109
  --vocab_path models/google_zh_vocab.txt \
@@ -121,7 +122,7 @@ python3 pretrain.py --dataset_path cluecorpussmall_seq128_dataset.pt \
121
  --learning_rate 1e-4 --batch_size 64 \
122
  --tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
123
  ```
124
- Stage2
125
  ```
126
  python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
127
  --vocab_path models/google_zh_vocab.txt \
 
29
 
30
  |Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
31
  |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
32
+ |RoBERTa-Tiny|72.3|83.0|91.4|81.8|62.0|55.0|60.3|
33
+ |RoBERTa-Mini|75.7|84.8|93.7|86.1|63.9|58.3|67.4|
34
+ |RoBERTa-Small|76.8|86.5|93.4|86.5|65.1|59.4|69.7|
35
+ |RoBERTa-Medium|77.8|87.6|94.8|88.1|65.6|59.5|71.2|
36
+ |RoBERTa-Base|79.5|89.1|95.2|89.2|67.0|60.9|75.5|
37
 
38
  For each task, we selected the best fine-tuning hyperparameters from the lists below:
39
  - epochs: 3, 5, 8
 
42
 
43
  ## How to use
44
 
45
+ You can use this model directly with a pipeline for masked language modeling (take the case of RoBERTa-Medium):
46
 
47
  ```python
48
  >>> from transformers import pipeline
 
102
 
103
  Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
104
 
105
+ Taking the case of RoBERTa-Medium
106
+
107
+ Stage1:
108
  ```
109
  python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
110
  --vocab_path models/google_zh_vocab.txt \
 
122
  --learning_rate 1e-4 --batch_size 64 \
123
  --tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
124
  ```
125
+ Stage2:
126
  ```
127
  python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
128
  --vocab_path models/google_zh_vocab.txt \