File size: 813 Bytes
b59836c
 
 
 
 
 
 
 
 
 
 
 
e909a74
 
 
0d4b898
b59836c
 
 
 
 
 
 
 
 
 
 
 
 
cac56e9
b59836c
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: apache-2.0
datasets:
- ivrit-ai/crowd-transcribe-v4
language:
- he
- en
base_model: openai/whisper-large-v2
pipeline_tag: automatic-speech-recognition
---

This is ivrit.ai's faster-whisper model, based on the ivrit-ai/whisper-v2-d4 Whisper model.

Training data includes 250 hours of volunteer-transcribed speech from the ivrit-ai/crowd-transcribe-v4 dataset, as well as 100 ours of professional transcribed speech from other sources.

Release date: September 8th, 2024.

# Prerequisites

pip3 install faster_whisper

# Usage

```
import faster_whisper
model = faster_whisper.WhisperModel('ivrit-ai/faster-whisper-v2-d4')

segs, _ = model.transcribe('media-file', language='he')

texts = [s.text for s in segs]

transcribed_text = ' '.join(texts)
print(f'Transcribed text: {transcribed_text}')
```