Spaces:
Running
Running
## token | |
space | |
```yml | |
# multi-space | |
{"id": 881, "token": "\r\n\r\n", "token_decode": "\r\n\r\n", "token_len": 4, "zh_count": 0, "space_count": 4, "digit_count": 0, "zh_symbol_count": 0} | |
# space + en | |
{"id": 862, "token": "\treturn", "token_decode": "\treturn", "token_len": 7, "zh_count": 0, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0} | |
# sapce + zh | |
{"id": 40195, "token": " 下", "token_decode": " 下", "token_len": 2, "zh_count": 1, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0} | |
``` | |
special_token | |
``` | |
{"id": 100257, "token": "<|endoftext|>", "token_decode": "<|endoftext|>", "token_len": 13, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
{"id": 100258, "token": "<|fim_prefix|>", "token_decode": "<|fim_prefix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
{"id": 100259, "token": "<|fim_middle|>", "token_decode": "<|fim_middle|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
{"id": 100260, "token": "<|fim_suffix|>", "token_decode": "<|fim_suffix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
{"id": 100276, "token": "<|endofprompt|>", "token_decode": "<|endofprompt|>", "token_len": 15, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
``` | |
汉字+符号 | |
``` | |
{"id": 39045, "token": ",请", "token_decode": ",请", "token_len": 2, "zh_count": 1, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} | |
``` | |
## 词典文件 | |
``` | |
IQ== 0 | |
Ig== 1 | |
Iw== 2 | |
JA== 3 | |
JQ== 4 | |
Jg== 5 | |
Jw== 6 | |
KA== 7 | |
``` | |
这是啥玩意? |