bofenghuang
/

whisper-large-v3-distil-fr-v0.2

@@ -37,15 +37,17 @@ All evaluation results on the public datasets can be found [here]().
 ### Short-Form Transcription
-| Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
 |-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
-| whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
-| whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
-| whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
-| whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
-| whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
-| distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
-| whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
 *Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
@@ -55,14 +57,16 @@ Due to the limited availability of out-of-distribution (OOD) and long-form Frenc
 Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
-| Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) |  | [mtedx](https://www.openslr.org/100/) |  | zaion5 |  | zaion6 |  |
 |-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
-|  | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
-| whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
-| whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
-| whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
-| whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
-| whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
-| distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
-| whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |

 ### Short-Form Transcription
+<!-- | Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
 |-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
+| [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
+| [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
+| [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
+| [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
+| [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
+| [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
+| whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 | -->
+![eval-short-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_short_form.png)
 *Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
 Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
+![eval-long-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_long_form.png)
+<!-- | Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) |  | [mtedx](https://www.openslr.org/100/) |  | zaion5 |  | zaion6 |  |
 |-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
+|  | chunked | seq | chunked | seq | chunked | seq | chunked | seq |
+| [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
+| [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
+| [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
+| [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
+| [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
+| [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
+| whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 | -->

assets/eval_short_form.png ADDED Viewed