|
--- |
|
license: afl-3.0 |
|
language: |
|
- zh |
|
tags: |
|
- bert |
|
- chinesebert |
|
- MLM |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
# ChineseBERT-large |
|
|
|
本项目是将ChineseBERT进行了加工,可供使用者直接使用HuggingFace API进行调用,无需再进行多余的代码配置。 |
|
|
|
原论文地址: |
|
**[ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information](https://arxiv.org/abs/2106.16038)** |
|
*Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu and Jiwei Li* |
|
|
|
原项目地址: |
|
[ChineseBERT github link](https://github.com/ShannonAI/ChineseBert) |
|
|
|
原模型地址: |
|
[ShannonAI/ChineseBERT-base](https://huggingface.co/ShannonAI/ChineseBERT-base) (该模型无法直接使用HuggingFace API调用) |
|
|
|
# 本项目使用方法 |
|
|
|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/iioSnail/ChineseBert/blob/main/demo/ChineseBERT-Demo.ipynb) |
|
|
|
1. 安装pypinyin |
|
|
|
``` |
|
pip install pypinyin |
|
``` |
|
|
|
2. 使用AutoClass加载tokenizer和model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("iioSnail/ChineseBERT-large", trust_remote_code=True) |
|
model = AutoModel.from_pretrained("iioSnail/ChineseBERT-large", trust_remote_code=True) |
|
``` |
|
|
|
3. 之后与普通BERT使用方法一致 |
|
|
|
```python |
|
inputs = tokenizer(["我 喜 [MASK] 猫"], return_tensors='pt') |
|
logits = model(**inputs).logits |
|
|
|
print(tokenizer.decode(logits.argmax(-1)[0, 1:-1])) |
|
``` |
|
|
|
输出: |
|
|
|
``` |
|
tokenizer.decode(logits.argmax(-1)[0, 1:-1]) |
|
``` |
|
|
|
# 常见问题 |
|
|
|
1. 网络问题,例如:`Connection Error` |
|
|
|
解决方案:将模型下载到本地使用。批量下载方案可参考该[博客](https://blog.csdn.net/zhaohongfei_358/article/details/126222999) |
|
|
|
2. 将模型下载到本地使用时出现报错:`ModuleNotFoundError: No module named 'transformers_modules.iioSnail/ChineseBERT-large'` |
|
|
|
解决方案:将 `iioSnail/ChineseBERT-large` 改为 `iioSnail\ChineseBERT-large` |