Comparison with other methods

by valbarriere - opened Feb 11

Feb 11

•

Hi thanks for the model!

You state that this model is better than other available ones, do you have results for this claim? Specially to compare this this one, which seemed pretty good in 2022.
For example what are the performances on the hungarian test set of Common Voice?

Thanks in advance for the answer!

PS: We are looking for an ASR to get transcripts of hungarian videos for a research project

sarpba

Owner Feb 11

Hi!

It's not the overall best hungarian specific ASR modell, It's only the best whisper-base hungarian specific modell. If you search the best ASR for hungaryan language, than check my tests here: https://huggingface.co/sarpba/whisper-teszt-eredmenyek
It's a test with 4 database. The test repo contain the test scripts too. Unfortunatly no have total winner. Some models achieve better results in one database, some in another.

If it's not urgent, the whisper-large-v3-turbo model is in the process of being fine-tuned, which I hope will work well. (I'll release it in a week.)

sarpba

Owner Feb 11

•

edited Feb 11

On CV17:
WER: 27.65%
CER: 6.77%
Norm WER: 23.53%
Norm CER: 5.77%

If you no want to use CT2, than i have better model (This is better than all Hungarian specific models. But it cannot be quantized with CT2.):
sarpba/whisper-hu-large-v3-turbo-finetuned
On CV17:
WER: 12.67%
CER: 2.42%
Norm WER: 10.59%
Norm CER: 1.95%

valbarriere

Feb 13

Thanks a lot for the informations!! The last model seems perfect for our use-case (extract transcript and timestamps offline).

I have a a last (naive) question: the gain in performances is pretty high after a fine-tuning on the target language, is it special because hungarian is mid-ressource language and not that familiar (non indo-european, for ex..) or is it the same for all the languages?

sarpba

Owner Feb 13

If I understand your question then the ansver is: The performance gain for this models only for hungarian language. I think on other languages performance is worse form original models.

If this is not what you were asking, please rephrase the question a little so that I can understand it better.

valbarriere

Feb 13

My question was a bit different: if I fine-tune whisper-large-v3-turbo-finetuned on Spanish data, will the gain of performance on Spanish language will be as big as the one for hungarian language? Because Spanish is likely to be more present in the initial training set of whisper-large-v3-turbo-finetuned.

sarpba

Owner Feb 14

The model is already much better in Spanish by default, so the level of improvement seen in Hungarian is certainly not available in Spanish, but I think there is always a point in fine-tuning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment