sanchit-gandhi commited on
Commit
58bf923
·
1 Parent(s): d515eda

finish description

Browse files
Files changed (1) hide show
  1. app.py +18 -9
app.py CHANGED
@@ -177,16 +177,25 @@ if __name__ == "__main__":
177
  )
178
  gr.Markdown(
179
  """
180
- Analyse the transcriptions generated by the Whisper and Distil-Whisper models on the TED-LIUM dev set.
181
- Analysis is performed on the overall level, where statistics are computed over the entire dev set, and also a per-sample level.
182
- The transcriptions for both models are shown at the bottom of the demo. The text diff for each is computed
183
- relative to the target transcriptions, where insertions are displayed in <span style='background-color:Lightgreen'>green</span>, and
184
- deletions in <span style='background-color:#FFCCCB'><s>red</s></span>.
185
-
186
  To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
187
- of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Overall, Distil-Whisper has
188
- roughly half the number of 5-Dup. and IER. This indicates that it has a lower propensity to hallucinate
189
- compared to the Whisper model.
 
 
 
 
 
 
 
 
 
190
  """
191
  )
192
  gr.Markdown("**Overall statistics:**")
 
177
  )
178
  gr.Markdown(
179
  """
180
+ One of the major claims of the <a href="https://arxiv.org/abs/2311.00430"> Distil-Whisper paper </a> is that
181
+ that Distil-Whisper hallucinates less than Whisper on long-form audio. To demonstrate this, we'll analyse the
182
+ transcriptions generated by <a href="https://huggingface.co/openai/whisper-large-v2"> Whisper </a>
183
+ and <a href="https://huggingface.co/distil-whisper/distil-large-v2"> Distil-Whisper </a> on the
184
+ <a href="https://huggingface.co/datasets/distil-whisper/tedlium-long-form"> TED-LIUM </a> validation set.
185
+
186
  To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
187
+ of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Analysis is performed on the
188
+ overall level, where statistics are computed over the entire dataset, and also a per-sample level (i.e. an
189
+ on an individual example basis).
190
+
191
+ The transcriptions for both models are shown at the bottom of the demo. We compute a text difference for each
192
+ relative to the ground truth transcriptions. Insertions are displayed in <span style='background-color:Lightgreen'>green</span>,
193
+ and deletions in <span style='background-color:#FFCCCB'><s>red</s></span>. Multiple words in <span style='background-color:Lightgreen'>green</span>
194
+ indicates that a model has hallucinated, since it has inserted words not present in the ground truth transcription.
195
+
196
+ Overall, Distil-Whisper has roughly half the number of 5-Dup. and IER. This indicates that it has a lower
197
+ propensity to hallucinate compared to the Whisper model. Try both models with some of the TED-LIUM examples
198
+ and view the reduction in hallucinations for yourself!
199
  """
200
  )
201
  gr.Markdown("**Overall statistics:**")