waveletdeboshir commited on
Commit
412d639
·
verified ·
1 Parent(s): 6aaecbc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: automatic-speech-recognition
5
+ tags:
6
+ - asr
7
+ - Pytorch
8
+ - pruned
9
+ - audio
10
+ - automatic-speech-recognition
11
+ language:
12
+ - en
13
+ - zh
14
+ - de
15
+ - es
16
+ - ru
17
+ - ko
18
+ - fr
19
+ - ja
20
+ - pt
21
+ - tr
22
+ - pl
23
+ - ca
24
+ - nl
25
+ - ar
26
+ - sv
27
+ - it
28
+ - id
29
+ - hi
30
+ - fi
31
+ - vi
32
+ - he
33
+ - uk
34
+ - el
35
+ - ms
36
+ - cs
37
+ - ro
38
+ - da
39
+ - hu
40
+ - ta
41
+ - 'no'
42
+ - th
43
+ - ur
44
+ - hr
45
+ - bg
46
+ - lt
47
+ - la
48
+ - mi
49
+ - ml
50
+ - cy
51
+ - sk
52
+ - te
53
+ - fa
54
+ - lv
55
+ - bn
56
+ - sr
57
+ - az
58
+ - sl
59
+ - kn
60
+ - et
61
+ - mk
62
+ - br
63
+ - eu
64
+ - is
65
+ - hy
66
+ - ne
67
+ - mn
68
+ - bs
69
+ - kk
70
+ - sq
71
+ - sw
72
+ - gl
73
+ - mr
74
+ - pa
75
+ - si
76
+ - km
77
+ - sn
78
+ - yo
79
+ - so
80
+ - af
81
+ - oc
82
+ - ka
83
+ - be
84
+ - tg
85
+ - sd
86
+ - gu
87
+ - am
88
+ - yi
89
+ - lo
90
+ - uz
91
+ - fo
92
+ - ht
93
+ - ps
94
+ - tk
95
+ - nn
96
+ - mt
97
+ - sa
98
+ - lb
99
+ - my
100
+ - bo
101
+ - tl
102
+ - mg
103
+ - as
104
+ - tt
105
+ - haw
106
+ - ln
107
+ - ha
108
+ - ba
109
+ - jw
110
+ - su
111
+ base_model:
112
+ - openai/whisper-large-v3-turbo
113
+ ---
114
+
115
+ # Whisper-large-v3-turbo-no-numbers
116
+
117
+ ## Model info
118
+ This is a version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) model without number tokens (token ids corresponding to numbers are excluded).
119
+ NO fine-tuning was used.
120
+
121
+ Phrases with spoken numbers will be transcribed with numbers as words.
122
+
123
+ Example: Instead of "25" this model will transcribe phrase as "twenty five".
124
+
125
+ ## Usage
126
+ Model can be used as an original whisper:
127
+
128
+ ```python
129
+ >>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
130
+ >>> import torchaudio
131
+
132
+ >>> # load audio
133
+ >>> wav, sr = torchaudio.load("audio.wav")
134
+
135
+ >>> # load model and processor
136
+ >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-turbo-no-numbers")
137
+ >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-turbo-no-numbers")
138
+
139
+ >>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
140
+
141
+ >>> # generate token ids
142
+ >>> predicted_ids = model.generate(input_features)
143
+ >>> # decode token ids to text
144
+ >>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
145
+ ['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> I'm twenty seven years old. <|endoftext|>']
146
+
147
+ ```
148
+ The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.