File size: 2,418 Bytes
e633ea7
8ce9738
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e633ea7
8ce9738
e633ea7
8ce9738
e633ea7
8ce9738
e633ea7
8ce9738
e633ea7
8ce9738
 
 
e633ea7
 
 
8ce9738
e633ea7
8ce9738
e633ea7
8ce9738
 
6f95623
e633ea7
8ce9738
 
 
 
 
 
 
 
e633ea7
8ce9738
e633ea7
8ce9738
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: apache-2.0
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
model-index:
  - name: MahmoudAshraf/acft-whisper-large-v3-turbo
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: distil-whisper/earnings22
          type: distil-whisper/earnings22
        metrics:
          - name: WER
            type: WER
            value: 15.605
---
# Model Card

## Model Description

This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper).

The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft).

- **Developed by:** Mahmoud Ashraf inspired by FUTO
- **License:** Apache-2.0
- **Finetuned from model:** OpenAI Whisper

## Uses

These models are not useful by themselves under default Whisper runtime configurations.

The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).

## Metrics
Speed was evaluated using TensorRT-LLM using In-flight Batching
Dynamic context was padded with additional 128 context for stability

| Model Name                                                       | WER on Earnings22    | Relative Speed |
|------------------------------------------------------------------|--------|----------------|
| Large-V3 Full Context                                            | 15.283 | 1.0x           |
| Large-V3 Dynamic Context                                         | 17.515 | 2.1x           |
| [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3)   | 15.381 | 2.1x           |
| Large-V3 Turbo Full Context                                      | 15.373 | 1.9x           |
| Large-V3 Turbo Dynamic Context                                      | 62.921 | 6.4x           |
| This Model | 15.605 | 5.1x           |

## Other Information

More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).