Comparison with other methods
Hi thanks for the model!
You state that this model is better than other available ones, do you have results for this claim? Specially to compare this this one, which seemed pretty good in 2022.
For example what are the performances on the hungarian test set of Common Voice?
Thanks in advance for the answer!
PS: We are looking for an ASR to get transcripts of hungarian videos for a research project
Hi!
It's not the overall best hungarian specific ASR modell, It's only the best whisper-base hungarian specific modell. If you search the best ASR for hungaryan language, than check my tests here: https://huggingface.co/sarpba/whisper-teszt-eredmenyek
It's a test with 4 database. The test repo contain the test scripts too. Unfortunatly no have total winner. Some models achieve better results in one database, some in another.
If it's not urgent, the whisper-large-v3-turbo model is in the process of being fine-tuned, which I hope will work well. (I'll release it in a week.)
On CV17:
WER: 27.65%
CER: 6.77%
Norm WER: 23.53%
Norm CER: 5.77%
If you no want to use CT2, than i have better model (This is better than all Hungarian specific models. But it cannot be quantized with CT2.):
sarpba/whisper-hu-large-v3-turbo-finetuned
On CV17:
WER: 12.67%
CER: 2.42%
Norm WER: 10.59%
Norm CER: 1.95%
Thanks a lot for the informations!! The last model seems perfect for our use-case (extract transcript and timestamps offline).
I have a a last (naive) question: the gain in performances is pretty high after a fine-tuning on the target language, is it special because hungarian is mid-ressource language and not that familiar (non indo-european, for ex..) or is it the same for all the languages?
If I understand your question then the ansver is: The performance gain for this models only for hungarian language. I think on other languages performance is worse form original models.
If this is not what you were asking, please rephrase the question a little so that I can understand it better.
My question was a bit different: if I fine-tune whisper-large-v3-turbo-finetuned
on Spanish data, will the gain of performance on Spanish language will be as big as the one for hungarian language? Because Spanish is likely to be more present in the initial training set of whisper-large-v3-turbo-finetuned
.
The model is already much better in Spanish by default, so the level of improvement seen in Hungarian is certainly not available in Spanish, but I think there is always a point in fine-tuning.