MahmoudAshraf's picture
Update README.md
6f95623 verified
|
raw
history blame
2.42 kB
---
license: apache-2.0
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
model-index:
- name: MahmoudAshraf/acft-whisper-large-v3-turbo
results:
- task:
type: automatic-speech-recognition
dataset:
name: distil-whisper/earnings22
type: distil-whisper/earnings22
metrics:
- name: WER
type: WER
value: 15.605
---
# Model Card
## Model Description
This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper).
The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft).
- **Developed by:** Mahmoud Ashraf inspired by FUTO
- **License:** Apache-2.0
- **Finetuned from model:** OpenAI Whisper
## Uses
These models are not useful by themselves under default Whisper runtime configurations.
The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).
## Metrics
Speed was evaluated using TensorRT-LLM using In-flight Batching
Dynamic context was padded with additional 128 context for stability
| Model Name | WER on Earnings22 | Relative Speed |
|------------------------------------------------------------------|--------|----------------|
| Large-V3 Full Context | 15.283 | 1.0x |
| Large-V3 Dynamic Context | 17.515 | 2.1x |
| [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3) | 15.381 | 2.1x |
| Large-V3 Turbo Full Context | 15.373 | 1.9x |
| Large-V3 Turbo Dynamic Context | 62.921 | 6.4x |
| This Model | 15.605 | 5.1x |
## Other Information
More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).