DILHTWD
/

whisper-large-v3-turbo-hsb

Automatic Speech Recognition

Model card Files Files and versions Community

whisper-large-v3-turbo-hsb / README.md

DILHTWD's picture

Update README.md

f8c8f05 verified about 1 month ago

|

history blame contribute delete

2.17 kB

	---
	license: agpl-3.0
	metrics:
	- wer
	base_model:
	- openai/whisper-large-v3-turbo
	pipeline_tag: automatic-speech-recognition
	tags:
	- upper_sorbian
	---


	## Model Description

	This model was fine-tuned on over 24 hours of transcribed upper sorbian speech to aid future research, conservation and revitalisation of the language.


	## Training Data
	- Source: Stiftung für das sorbische Volk / Załožba za serbski lud (https://stiftung.sorben.com/)
	- Volume: 1493 Minutes, 10% Validation Set, 10% Test Set

	## Training Details
	- Hyperparameters:
	- Batch size: 64
	- Learning rate: 3e-6, linear decay
	- Optimizer: AdamW
	- Warmup: 1000 steps
	- Additional Techniques: BF16 training, initial 15 layers frozen


	## Performance
	### Metrics
	- Word Error Rate: 6.2

	## Usage
	### Example Code

	To use the model, follow this example code:

	```python
	import torch
	import torchaudio
	from transformers import WhisperProcessor, WhisperForConditionalGeneration

	# Load the model and processor
	model_name = "DILHTWD/whisper-large-v3-turbo-hsb"
	processor_name = "openai/whisper-large-v3-turbo"
	processor = WhisperProcessor.from_pretrained(processor_name)
	model = WhisperForConditionalGeneration.from_pretrained(model_name)

	# Load and preprocess the audio
	audio, sample_rate = torchaudio.load("test.mp3")
	if sample_rate != 16000:
	audio = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(audio)
	input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features

	# Generate transcription
	with torch.no_grad():
	predicted_ids = model.generate(input_features)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

	# Print the transcription
	print("Transcription:", transcription)
	```

	## Model Details
	- Model Name: DILHTWD/whisper-large-v3-turbo-hsb
	- Publisher: Data Intelligence Lab, Hochschule für Technik und Wirtschaft Dresden
	- Model Version: 1.0.0
	- Model Date: 2024-11-15
	- License: [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.de.html)
	- Architecture: Whisper Large v3 Turbo
	- Task: Automatic Speech Recognition