|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- openai/whisper-large-v3-turbo |
|
pipeline_tag: automatic-speech-recognition |
|
model-index: |
|
- name: MahmoudAshraf/acft-whisper-large-v3-turbo |
|
results: |
|
- task: |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: distil-whisper/earnings22 |
|
type: distil-whisper/earnings22 |
|
metrics: |
|
- name: WER |
|
type: WER |
|
value: 15.605 |
|
--- |
|
# Model Card |
|
|
|
## Model Description |
|
|
|
This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper). |
|
|
|
The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft). |
|
|
|
- **Developed by:** Mahmoud Ashraf inspired by FUTO |
|
- **License:** Apache-2.0 |
|
- **Finetuned from model:** OpenAI Whisper |
|
|
|
## Uses |
|
|
|
These models are not useful by themselves under default Whisper runtime configurations. |
|
|
|
The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness). |
|
|
|
## Metrics |
|
Speed was evaluated using TensorRT-LLM using In-flight Batching |
|
Dynamic context was padded with additional 128 context for stability |
|
|
|
| Model Name | WER on Earnings22 | Relative Speed | |
|
|------------------------------------------------------------------|--------|----------------| |
|
| Large-V3 Full Context | 15.283 | 1.0x | |
|
| Large-V3 Dynamic Context | 17.515 | 2.1x | |
|
| [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3) | 15.381 | 2.1x | |
|
| Large-V3 Turbo Full Context | 15.373 | 1.9x | |
|
| Large-V3 Turbo Dynamic Context | 62.921 | 6.4x | |
|
| This Model | 15.605 | 5.1x | |
|
|
|
## Other Information |
|
|
|
More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness). |
|
|