File size: 2,418 Bytes
e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 6f95623 e633ea7 8ce9738 e633ea7 8ce9738 e633ea7 8ce9738 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: apache-2.0
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
model-index:
- name: MahmoudAshraf/acft-whisper-large-v3-turbo
results:
- task:
type: automatic-speech-recognition
dataset:
name: distil-whisper/earnings22
type: distil-whisper/earnings22
metrics:
- name: WER
type: WER
value: 15.605
---
# Model Card
## Model Description
This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper).
The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft).
- **Developed by:** Mahmoud Ashraf inspired by FUTO
- **License:** Apache-2.0
- **Finetuned from model:** OpenAI Whisper
## Uses
These models are not useful by themselves under default Whisper runtime configurations.
The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).
## Metrics
Speed was evaluated using TensorRT-LLM using In-flight Batching
Dynamic context was padded with additional 128 context for stability
| Model Name | WER on Earnings22 | Relative Speed |
|------------------------------------------------------------------|--------|----------------|
| Large-V3 Full Context | 15.283 | 1.0x |
| Large-V3 Dynamic Context | 17.515 | 2.1x |
| [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3) | 15.381 | 2.1x |
| Large-V3 Turbo Full Context | 15.373 | 1.9x |
| Large-V3 Turbo Dynamic Context | 62.921 | 6.4x |
| This Model | 15.605 | 5.1x |
## Other Information
More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).
|