tokenizer issues
#15
by
Imran1
- opened
loading the vocab file showing this
from transformers import Wav2Vec2CTCTokenizer
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("./", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
error
TypeError Traceback (most recent call last)
<ipython-input-26-1f25f0d516f8> in <cell line: 3>()
1 from transformers import Wav2Vec2CTCTokenizer
2
----> 3 tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("./", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
7 frames
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py in added_tokens_encoder(self)
389 optimisation in `self._added_tokens_encoder` for the slow tokenizers.
390 """
--> 391 return {k.content: v for v, k in sorted(self._added_tokens_decoder.items(), key=lambda item: item[0])}
392
393 @property
TypeError: '<' not supported between instances of 'str' and 'int'
here is my vocab dict
vocab_dict= {
0: ' ',
1: 'ا',
2: 'آ',
3: 'ب',
4: 'پ',
5: 'ت',
6: 'ټ',
7: 'ث',
8: 'ج',
9: 'چ',
10: 'ح',
11: 'خ',
12: 'د',
13: 'ډ',
14: 'ځ',
15: 'ر',
16: 'ړ',
17: 'ژ',
18: 'س',
19: 'ش',
20: 'ص',
21: 'ض',
22: 'ط',
23: 'ظ',
24: 'ع',
25: 'غ',
26: 'ف',
27: 'ق',
28: 'ک',
29: 'ګ',
30: 'ل',
31: 'م',
32: 'ن',
33: 'ڼ',
34: 'و',
35: 'ہ',
36: 'ھ',
37: 'ء',
38: 'ئ',
39: 'ی',
40: 'ے'
}