Spaces:

japanese-asr
/

README

Running

App Files Files Community

asahi417 commited on Sep 18, 2024

Commit

03872a0

verified ·

1 Parent(s): 388fa98

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -22

README.md CHANGED Viewed

@@ -12,7 +12,8 @@ This repository contains all the models and datasets for train/evaluate the Japa
 Following table shows CER comparison with different data size of ReazonSpeech used to distill [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3). The model names follows
 `japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}`.
-***CER (Normalized)***: Normalized CER is the commonly used metric for Japanese ASR, where the punctuations and special characters are ignored.
 | model                                                                                                                                             |   [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
 |:--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------:|-----------------:|--------------------:|
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all)       |               9.20 |             8.40 |               11.63 |
@@ -21,29 +22,12 @@ Following table shows CER comparison with different data size of ReazonSpeech us
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small)   |              30.48 |            38.96 |               42.29 |
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny)     |              94.69 |            95.32 |               95.82 |
 | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |               8.52 |             7.18 |               15.18 |
-| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                     |                                    9.7 |                                  8.2 |                                    28.5 |
-| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                        |                                   10   |                                  8.9 |                                    34.4 |
-| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                         |                                   28.2 |                                 25   |                                    69.4 |
 | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |              11.34 |             9.87 |               29.56 |
 | [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |              15.26 |            14.22 |               34.29 |
 | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |              46.86 |            35.69 |               96.69 |
 | [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2)                                               |               9.07 |             7.43 |               11.17 |
-<!-- ***CER (Raw)***: In contrast to the normalized CER, raw CER is namely the CER computed over the raw prediction and label without any normalizaion.
-| model                                                                                                                                             |   [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
-|:------------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
-| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all)    |                                   15.4 |                                 15.4 |                                    17.4 |
-| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large)  |                                   15.5 |                                 15.2 |                                    17.8 |
-| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium) |                                   17   |                                 18.4 |                                    20.2 |
-| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small)  |                                   34.4 |                                 43.2 |                                    44.2 |
-| [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny)   |                                   93.7 |                                 95.1 |                                    95.6 |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                      |                                   12.9 |                                 13.4 |                                    20.6 |
-| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                     |                                   13.5 |                                 10.6 |                                    34.4 |
-| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                        |                                   14   |                                 11.2 |                                    40.4 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                       |                                   15.4 |                                 13   |                                    39   |
-| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                         |                                   31.6 |                                 26.4 |                                    74.5 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                       |                                   19.8 |                                 18.8 |                                    47   |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                         |                                   61.3 |                                 39.4 |                                   156.5 |
-| [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2)                        |                                   12.6 |                                 10.6 |                                    15.4 |
-   -->

 Following table shows CER comparison with different data size of ReazonSpeech used to distill [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3). The model names follows
 `japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}`.
+***CER***
 | model                                                                                                                                             |   [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
 |:--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------:|-----------------:|--------------------:|
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all)       |               9.20 |             8.40 |               11.63 |
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small)   |              30.48 |            38.96 |               42.29 |
 | [japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny](https://huggingface.co/japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny)     |              94.69 |            95.32 |               95.82 |
 | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |               8.52 |             7.18 |               15.18 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                         |               9.70 |             8.20 |               28.50 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                               |              10.00 |             8.90 |               34.40 |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                 |              28.20 |            25.00 |               69.40 |
 | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |              11.34 |             9.87 |               29.56 |
 | [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |              15.26 |            14.22 |               34.29 |
 | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |              46.86 |            35.69 |               96.69 |
 | [reazon-research/reazonspeech-nemo-v2](https://huggingface.co/reazon-research/reazonspeech-nemo-v2)                                               |               9.07 |             7.43 |               11.17 |
+Please find more detailed results at [kotoba-whisper codebase](https://github.com/kotoba-tech/kotoba-whisper).