uer
/

chinese_roberta_L-4_H-512

Inference Endpoints

Model card Files Files and versions Community

uer commited on Dec 22, 2020

Commit

1eb83a7

·

1 Parent(s): fde00b5

Update README.md

Files changed (1) hide show

README.md +10 -9

README.md CHANGED Viewed

@@ -29,11 +29,11 @@ Here are scores on the devlopment set of six Chinese tasks:
 |Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|BERT-Tiny|72.3|83.0|91.4|81.8|62.0|55.0|60.3|
-|BERT-Mini|75.7|84.8|93.7|86.1|63.9|58.3|67.4|
-|BERT-Small|76.8|86.5|93.4|86.5|65.1|59.4|69.7|
-|BERT-Medium|77.8|87.6|94.8|88.1|65.6|59.5|71.2|
-|BERT-Base|79.5|89.1|95.2|89.2|67.0|60.9|75.5|
 For each task, we selected the best fine-tuning hyperparameters from the lists below:
 - epochs: 3, 5, 8
@@ -42,7 +42,7 @@ For each task, we selected the best fine-tuning hyperparameters from the lists b
 ## How to use
-You can use this model directly with a pipeline for masked language modeling (take the case of BERT-Medium):
 ```python
 >>> from transformers import pipeline
@@ -102,8 +102,9 @@ CLUECorpusSmall is used as training data. We found that models pre-trained on CL
 Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
-Taking the case of BERT-Medium:
-Stage1
 ```
 python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
                       --vocab_path models/google_zh_vocab.txt \
@@ -121,7 +122,7 @@ python3 pretrain.py --dataset_path cluecorpussmall_seq128_dataset.pt \
 					--learning_rate 1e-4 --batch_size 64 \
 					--tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
 ```
-Stage2
 ```
 python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
                       --vocab_path models/google_zh_vocab.txt \

 |Model|Score|douban|chnsenticorp|lcqmc|tnews(CLUE)|iflytek(CLUE)|ocnli(CLUE)|
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|RoBERTa-Tiny|72.3|83.0|91.4|81.8|62.0|55.0|60.3|
+|RoBERTa-Mini|75.7|84.8|93.7|86.1|63.9|58.3|67.4|
+|RoBERTa-Small|76.8|86.5|93.4|86.5|65.1|59.4|69.7|
+|RoBERTa-Medium|77.8|87.6|94.8|88.1|65.6|59.5|71.2|
+|RoBERTa-Base|79.5|89.1|95.2|89.2|67.0|60.9|75.5|
 For each task, we selected the best fine-tuning hyperparameters from the lists below:
 - epochs: 3, 5, 8
 ## How to use
+You can use this model directly with a pipeline for masked language modeling (take the case of RoBERTa-Medium):
 ```python
 >>> from transformers import pipeline
 Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
+Taking the case of RoBERTa-Medium
+Stage1:
 ```
 python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
                       --vocab_path models/google_zh_vocab.txt \
 					--learning_rate 1e-4 --batch_size 64 \
 					--tie_weights --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm
 ```
+Stage2:
 ```
 python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
                       --vocab_path models/google_zh_vocab.txt \