Spaces:
Running
Running
File size: 453 Bytes
2bd606a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
## vocabsize不一致问题
- .vcab_size
- Size of the base vocabulary (without the added tokens)
- 来自 https://huggingface.co/transformers/v2.11.0/main_classes/tokenizer.html
- len(tokenizer)
- Size of the full vocabulary with the added tokens.
- https://github.com/huggingface/transformers/issues/12632
- max(tokenizer.get_vocab().values())
- 包括不连续的 token_id
- https://github.com/huggingface/transformers/issues/4875
|