Spaces:
Running
Running
File size: 645 Bytes
751936e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
## vocab_file
- ice_text.model
- 二进制文件
- num_image_tokens = 20000
词典大小 150528
```
tokens: ['▁good', '▁morning'] ; id: [20315, 21774] ; text: good morning
tokens: ['▁good', '<|blank_2|>', 'morning'] ; id: [20315, 150009, 60813] ; text: good morning
tokens: ['▁', 'goog', '▁morning', 'abc'] ; id: [20005, 46456, 21774, 27415] ; text: goog morningabc
tokens: ['▁', '你是谁'] ; id: [20005, 128293] ; text: 你是谁
```
`▁` 是啥,空格吗?注意区分 `_`
##
```
tokenizer = TextTokenizer(self.vocab_file)
```
|