Spaces:
Running
Running
File size: 487 Bytes
751936e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
来源:
- https://github.com/THUDM/GLM/tree/main/chinese_sentencepiece
- https://huggingface.co/THUDM/glm-10b-chinese/
## HF
```
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
```
## 分词器
tokenizer_config.json
```
"AutoTokenizer": [
"tokenization_glm.GLMChineseTokenizer",
null
]
```
其中 GLMChineseTokenizer
```
https://huggingface.co/THUDM/glm-10b-chinese/blob/main/tokenization_glm.py
```
## 词典
来自
|