bofenghuang commited on
Commit
a9e66d6
·
1 Parent(s): 702248d
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -33,7 +33,7 @@ The model was evaluated on both short and long-form transcriptions, using in-dis
33
 
34
  Note that Word Error Rate (WER) results shown here are [post-normalization](https://github.com/openai/whisper/blob/main/whisper/normalizers/basic.py), which includes converting text to lowercase and removing symbols and punctuation.
35
 
36
- All evaluation results on the public datasets can be found [here]().
37
 
38
  ### Short-Form Transcription
39
 
@@ -380,7 +380,7 @@ print(result["text"])
380
 
381
  ## Training details
382
 
383
- We built a French speech recognition dataset of over 22,000 hours of annotated and semi-annotated speech. After decoding this dataset through Whisper Large V3 and filtering out segments with WER above 20%, we retained approximately 10,000 hours of high-quality audio.
384
 
385
  | Dataset | Total Duration (h) | Filtered Duration (h) <20% WER |
386
  |---|---:|---:|
 
33
 
34
  Note that Word Error Rate (WER) results shown here are [post-normalization](https://github.com/openai/whisper/blob/main/whisper/normalizers/basic.py), which includes converting text to lowercase and removing symbols and punctuation.
35
 
36
+ All evaluation results on the public datasets can be found [here](https://drive.google.com/drive/folders/1iJ5GXQap8Bz_Tn_mh58EfCb81UQHvgzi?usp=sharing).
37
 
38
  ### Short-Form Transcription
39
 
 
380
 
381
  ## Training details
382
 
383
+ We built a French speech recognition dataset of over 22,000 hours of annotated and semi-annotated speech. After decoding this dataset through Whisper-Large-V3 and filtering out segments with WER above 20%, we retained approximately 10,000 hours of high-quality audio.
384
 
385
  | Dataset | Total Duration (h) | Filtered Duration (h) <20% WER |
386
  |---|---:|---:|