bofenghuang
commited on
Commit
·
f81ed15
1
Parent(s):
f1500d9
up
Browse files- README.md +21 -17
- assets/eval_short_form.png +0 -0
README.md
CHANGED
@@ -37,15 +37,17 @@ All evaluation results on the public datasets can be found [here]().
|
|
37 |
|
38 |
### Short-Form Transcription
|
39 |
|
40 |
-
| Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
|
41 |
|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
|
42 |
-
| whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
|
43 |
-
| whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
|
44 |
-
| whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
|
45 |
-
| whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
|
46 |
-
| whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
|
47 |
-
| distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
|
48 |
-
| whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
|
|
|
|
|
49 |
|
50 |
*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
|
51 |
|
@@ -55,14 +57,16 @@ Due to the limited availability of out-of-distribution (OOD) and long-form Frenc
|
|
55 |
|
56 |
Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
|
57 |
|
58 |
-
|
|
|
|
|
59 |
|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
|
60 |
-
| | chunked |
|
61 |
-
| whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
|
62 |
-
| whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
|
63 |
-
| whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
|
64 |
-
| whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
|
65 |
-
| whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
|
66 |
-
| distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
|
67 |
-
| whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |
|
68 |
|
|
|
37 |
|
38 |
### Short-Form Transcription
|
39 |
|
40 |
+
<!-- | Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
|
41 |
|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
|
42 |
+
| [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
|
43 |
+
| [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
|
44 |
+
| [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
|
45 |
+
| [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
|
46 |
+
| [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
|
47 |
+
| [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
|
48 |
+
| whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 | -->
|
49 |
+
|
50 |
+
![eval-short-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_short_form.png)
|
51 |
|
52 |
*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
|
53 |
|
|
|
57 |
|
58 |
Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
|
59 |
|
60 |
+
![eval-long-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/eval_long_form.png)
|
61 |
+
|
62 |
+
<!-- | Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) | | [mtedx](https://www.openslr.org/100/) | | zaion5 | | zaion6 | |
|
63 |
|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
|
64 |
+
| | chunked | seq | chunked | seq | chunked | seq | chunked | seq |
|
65 |
+
| [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
|
66 |
+
| [whisper_large_v3_turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
|
67 |
+
| [whisper-large-v3-french](https://huggingface.co/bofenghuang/whisper-large-v3-french) | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
|
68 |
+
| [whisper-large-v3-french-distil-dec16](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16) | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
|
69 |
+
| [whisper-large-v3-french-distil-dec2](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec2) | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
|
70 |
+
| [distil-large-v3-fr](https://huggingface.co/eustlb/distil-large-v3-fr) | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
|
71 |
+
| whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 | -->
|
72 |
|
assets/eval_short_form.png
ADDED