File size: 5,768 Bytes
094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c 094e9d9 2f2761c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
- generated_from_trainer
model-index:
- name: whisper-small-indo-eng
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# whisper-small-indo-eng
## Model description
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on an [cobrayyxx/FLEURS_INDO-ENG_Speech_Translation](https://huggingface.co/datasets/cobrayyxx/FLEURS_INDO-ENG_Speech_Translation) dataset.
## Dataset: FLEURS_INDO-ENG_Speech_Translation
This model was fine-tuned using the `cobrayyxx/FLEURS_INDO-ENG_Speech_Translation` dataset, a speech translation dataset for the **Indonesian ↔ English** language pair. The dataset is part of the FLEURS (Few-shot Learning Evaluation of Universal Representations of Speech) collection and is specifically designed for speech-to-text translation tasks.
### Key Features:
- **audio**: Audio clip in Bahasa/Indonesian
- **text_indo**: Audio transcription in Bahasa/Indonesian.
- **text_en**: Audio transcription in English.
### Dataset Usage
- **Training Data**: Used to fine-tune the Whisper model for Indonesian → English speech-to-text translation.
- **Validation Data**: Used to evaluate the performance of the model during training.
## Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps (epoch): 100
- mixed_precision_training: Native AMP
## Model Evaluation
The performance of the baseline and fine-tuned models was evaluated using the BLEU and CHRF metrics on the validation dataset.
This fine-tuned model shows a slight improvement over the baseline model.
| Model | BLEU Score | CHRF Score |
|------------------|------------|------------|
| Baseline Model | **33.03** | **52.71** |
| Fine-Tuned Model | **34.82** | **61.45** |
### Evaluation Details
- **BLEU**: Measures the overlap between predicted and reference text based on n-grams.
- **CHRF**: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.
### Reproduce Steps
After [training](https://huggingface.co/blog/fine-tune-whisper) and push the training model to hugging-face.
we have to follow several steps before we can evaluate it:
1. Push tokenizer manually by creating it from WhisperTokenizerFast.
```
from transformers import WhisperTokenizerFast
# Load your fine-tuned tokenizer
tokenizer = WhisperTokenizerFast.from_pretrained("openai/whisper-small", language="en", task="translate")
# Save the tokenizer locally
tokenizer.save_pretrained("whisper-small-indo-eng",legacy_format=False)
# Push the tokenizer to the Hugging Face Hub
tokenizer.push_to_hub("cobrayyxx/whisper-small-indo-eng")
```
2. Convert your model from the model compatible with Transformers to model compatible with CTranslate2 (src: https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#model-conversion)
```
!ct2-transformers-converter --model cobrayyxx/whisper-small-indo-eng --output_dir cobrayyxx/whisper-small-indo-eng-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16
```
3. Load the model for WhisperModel with your ct2-model, in this case is `cobrayyxx/whisper-small-indo-eng-ct2`.
4. Now we can do the evaluation process using faster-whisper to load the model and sacrebleu to use metric evaluation.
```
def predict(audio_array):
model_name = "cobrayyxx/whisper-small-indo-eng-ct2" # pretrained model - try "tiny", "base", "small", or "medium"
model = WhisperModel(model_name, device="cuda", compute_type="float16")
segments, info = model.transcribe(audio_array,
beam_size=5,
language="en",
vad_filter=True
)
return segments, info
def metric_calculation(dataset):
val_data = fleurs_dataset["validation"]
bleu = BLEU()
chrf = CHRF()
lst_pred = []
lst_gold = []
for data in tqdm(val_data):
gold_standard = data["text_en"]
gold_standard = gold_standard.lower().strip()
audio_array = data["audio"]["array"]
# Ensure it's 1D
audio_array = np.ravel(audio_array)
# Convert to float32 if necessary
audio_array = audio_array.astype(np.float32)
pred_segments, pred_info = predict(audio_array)
prediction_text = " ".join(segment.text for segment in pred_segments).lower().strip()
lst_pred.append(prediction_text)
lst_gold.append([gold_standard])
bleu_score = bleu.corpus_score(lst_pred, lst_gold).score
chrf_score = chrf.corpus_score(lst_pred, lst_gold).score
return bleu_score, chrf_score
```
Now run the evaluation.
```
pretrain_bleu_score, pretrain_chrf_score = metric_calculation(fleurs_dataset)
pretrain_bleu_score, pretrain_chrf_score
```
## Framework versions
- Transformers 4.46.3
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.20.3
## Reference
- https://huggingface.co/blog/fine-tune-whisper
## Credits
Huge thanks to [Yasmin Moslem ](https://huggingface.co/ymoslem) for mentoring me. |