bofenghuang commited on
Commit
f81ed15
·
1 Parent(s): f1500d9
Files changed (2) hide show
  1. README.md +21 -17
  2. assets/eval_short_form.png +0 -0
README.md CHANGED
@@ -37,15 +37,17 @@ All evaluation results on the public datasets can be found [here]().
37
 
38
  ### Short-Form Transcription
39
 
40
- | Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
41
  |-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
42
- | whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
43
- | whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
44
- | whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
45
- | whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
46
- | whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
47
- | distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
48
- | whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
 
 
49
 
50
  *Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
51
 
@@ -55,14 +57,16 @@ Due to the limited availability of out-of-distribution (OOD) and long-form Frenc
55
 
56
  Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
57
 
58
- | Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) | | [mtedx](https://www.openslr.org/100/) | | zaion5 | | zaion6 | |
 
 
59
  |-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
60
- | | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
61
- | whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
62
- | whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
63
- | whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
64
- | whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
65
- | whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
66
- | distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
67
- | whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |
68
 
 
37
 
38
  ### Short-Form Transcription
39
 
40
+ <!-- | Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
41
  |-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
42
+ | [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
43
+ | [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
44
+ | [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
45
+ | [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
46
+ | [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
47
+ | [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
48
+ | whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 | -->
49
+
50
+ ![eval-short-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_short_form.png)
51
 
52
  *Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
53
 
 
57
 
58
  Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
59
 
60
+ ![eval-long-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_long_form.png)
61
+
62
+ <!-- | Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) | | [mtedx](https://www.openslr.org/100/) | | zaion5 | | zaion6 | |
63
  |-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
64
+ | | chunked | seq | chunked | seq | chunked | seq | chunked | seq |
65
+ | [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
66
+ | [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
67
+ | [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
68
+ | [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
69
+ | [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
70
+ | [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
71
+ | whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 | -->
72
 
assets/eval_short_form.png ADDED