File size: 1,647 Bytes
f4973d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d27a756
 
 
 
 
 
 
 
f4973d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49


## token



space
```yml
# multi-space
{"id": 881, "token": "\r\n\r\n", "token_decode": "\r\n\r\n", "token_len": 4, "zh_count": 0, "space_count": 4, "digit_count": 0, "zh_symbol_count": 0}
# space + en
{"id": 862, "token": "\treturn", "token_decode": "\treturn", "token_len": 7, "zh_count": 0, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0}
# sapce + zh  
{"id": 40195, "token": " 下", "token_decode": " 下", "token_len": 2, "zh_count": 1, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0}
```


special_token
```
{"id": 100257, "token": "<|endoftext|>", "token_decode": "<|endoftext|>", "token_len": 13, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
{"id": 100258, "token": "<|fim_prefix|>", "token_decode": "<|fim_prefix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
{"id": 100259, "token": "<|fim_middle|>", "token_decode": "<|fim_middle|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
{"id": 100260, "token": "<|fim_suffix|>", "token_decode": "<|fim_suffix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
{"id": 100276, "token": "<|endofprompt|>", "token_decode": "<|endofprompt|>", "token_len": 15, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
```

汉字+符号
```
{"id": 39045, "token": ",请", "token_decode": ",请", "token_len": 2, "zh_count": 1, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0}
```




## 词典文件


```
IQ== 0
Ig== 1
Iw== 2
JA== 3
JQ== 4
Jg== 5
Jw== 6
KA== 7
```

这是啥玩意?