MahmoudAshraf
/

acft-whisper-large-v3-turbo

Automatic Speech Recognition

Model card Files Files and versions Community

acft-whisper-large-v3-turbo / README.md

MahmoudAshraf's picture

Update README.md

6f95623 verified about 1 month ago

|

2.42 kB

	---
	license: apache-2.0
	base_model:
	- openai/whisper-large-v3-turbo
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: MahmoudAshraf/acft-whisper-large-v3-turbo
	results:
	- task:
	type: automatic-speech-recognition
	dataset:
	name: distil-whisper/earnings22
	type: distil-whisper/earnings22
	metrics:
	- name: WER
	type: WER
	value: 15.605
	---
	# Model Card

	## Model Description

	This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper).

	The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft).

	- Developed by: Mahmoud Ashraf inspired by FUTO
	- License: Apache-2.0
	- Finetuned from model: OpenAI Whisper

	## Uses

	These models are not useful by themselves under default Whisper runtime configurations.

	The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).

	## Metrics
	Speed was evaluated using TensorRT-LLM using In-flight Batching
	Dynamic context was padded with additional 128 context for stability

	\| Model Name \| WER on Earnings22 \| Relative Speed \|
	\|------------------------------------------------------------------\|--------\|----------------\|
	\| Large-V3 Full Context \| 15.283 \| 1.0x \|
	\| Large-V3 Dynamic Context \| 17.515 \| 2.1x \|
	\| [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3) \| 15.381 \| 2.1x \|
	\| Large-V3 Turbo Full Context \| 15.373 \| 1.9x \|
	\| Large-V3 Turbo Dynamic Context \| 62.921 \| 6.4x \|
	\| This Model \| 15.605 \| 5.1x \|

	## Other Information

	More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).