kotoba-tech
/

kotoba-whisper-v1.0

@@ -89,24 +89,34 @@ from ReazonSpeech, and achieves competitive CER and WER on the out-of-domain tes
 the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
 - ***CER***
-| Model                                                                                           | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
-|:------------------------------------------------------------------------------------------------|---------------------------:|----------------:|------------------:|
-| [**kotoba-tech/kotoba-whisper-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)   |                   9.44 |        8.48 |         **12.60** |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                       |                       **8.52** |            **7.18** |             15.18 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                           |                      11.34 |            9.87 |             29.56 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                             |                      15.26 |           14.22 |             34.29 |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                               |                      46.86 |           35.69 |             96.69 |
 - ***WER***
-| Model                                                                                           | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
-|:------------------------------------------------------------------------------------------------|---------------------------:|----------------:|------------------:|
-| [**kotoba-tech/kotoba-whisper-v1.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)   |                  59.27 |       64.36 |         **56.62** |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                       |                      **55.41** |           **59.34** |             60.23 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                           |                      63.64 |           69.52 |             76.04 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                             |                      74.21 |           82.02 |             82.99 |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                               |                      93.78 |           97.72 |             94.85 |
 - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
 it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)

 the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
 - ***CER***
+| model                                                                                                                                             |   [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
+| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)                                                         |                                                                                                         9.2 |                                                                                     8.4 |                                                                                                        11.6 |
+| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)                                                         |                                                                                                         9.4 |                                                                                     8.5 |                                                                                                        12.2 |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |                                                                                                         8.5 |                                                                                     7.1 |                                                                                                        14.9 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                         |                                                                                                         9.7 |                                                                                     8.2 |                                                                                                        28.1 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                               |                                                                                                        10   |                                                                                     8.9 |                                                                                                        34.1 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |                                                                                                        11.5 |                                                                                    10   |                                                                                                        33.2 |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                 |                                                                                                        28.6 |                                                                                    24.9 |                                                                                                        70.4 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |                                                                                                        15.1 |                                                                                    14.2 |                                                                                                        41.5 |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |                                                                                                        53.7 |                                                                                    36.5 |                                                                                                       137.9 |
 - ***WER***
+| model                                                                                                                                             |   [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
+| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)                                                         |                                                                                                        58.8 |                                                                                    63.7 |                                                                                                        55.6 |
+| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)                                                         |                                                                                                        59.2 |                                                                                    64.3 |                                                                                                        56.4 |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |                                                                                                        55.1 |                                                                                    59.2 |                                                                                                        60.2 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                         |                                                                                                        59.3 |                                                                                    63.2 |                                                                                                        74.1 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                               |                                                                                                        61.1 |                                                                                    66.4 |                                                                                                        74.9 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |                                                                                                        63.4 |                                                                                    69.5 |                                                                                                        76   |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                 |                                                                                                        87.2 |                                                                                    93   |                                                                                                        91.8 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |                                                                                                        74.2 |                                                                                    81.9 |                                                                                                        83   |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |                                                                                                        93.8 |                                                                                    97.6 |                                                                                                        94.9 |
 - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
 it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)