bert-base-zh-cased

We are sharing smaller versions of bert-base-multilingual-cased that handle a custom number of languages.

Unlike distilbert-base-multilingual-cased, our versions give exactly the same representations produced by the original model which preserves the original accuracy.

For more information please visit our paper: Load What You Need: Smaller Versions of Multilingual BERT.

How to use

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-zh-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-zh-cased")

To generate other smaller versions of multilingual transformers please visit our Github repo.

How to cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, GrΓ©goire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact [email protected] for any question, feedback or request.

Downloads last month
16
Safetensors
Model size
96M params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train Geotrend/bert-base-zh-cased