File size: 5,230 Bytes
1d95344
 
 
 
 
 
 
 
6a65974
1d95344
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02d76b7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: Audio-Transcriptor
sdk: gradio
emoji: 
colorFrom: green
colorTo: red
pinned: false
short_description: Audio transcription with speaker diarization using Whisper.
sdk_version: 5.3.0
---
# Audio Transcription and Diarization Tool

## Overview

This project provides a robust set of tools for transcribing audio files using the Whisper model and performing speaker diarization with PyAnnote. Users can process audio files, record audio, and save transcriptions with speaker identification.

## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Setup](#setup)
- [Usage](#usage)
  - [Basic Example](#basic-example)
  - [Audio Processing Example](#audio-processing-example)
  - [Transcribing an Existing Audio File or Recording](#transcribing-an-existing-audio-file-or-recording)
- [Key Components](#key-components)
  - [Transcriptor](#transcriptor)
  - [AudioProcessor](#audioprocessor)
  - [AudioRecording](#audiorecording)
- [Contributing](#contributing)
- [Acknowledgments](#acknowledgments)

## Features

- **Transcription**: Convert audio files in various formats to text (automatically converts to WAV).
- **Speaker Diarization**: Identify different speakers in the audio.
- **Speaker Retrieval**: Name speakers during transcription.
- **Audio Recording**: Record audio directly from a microphone.
- **Audio Preprocessing**: Includes resampling, format conversion, and audio enhancement.
- **Multiple Model Support**: Choose from various Whisper model sizes.

## Supported Whisper Models

This tool supports various Whisper model sizes, allowing you to balance accuracy and computational resources:

- **`tiny`**: Fastest, lowest accuracy
- **`base`**: Fast, good accuracy
- **`small`**: Balanced speed and accuracy
- **`medium`**: High accuracy, slower
- **`large`**: High accuracy, resource-intensive
- **`large-v1`**: Improved large model
- **`large-v2`**: Further improved large model
- **`large-v3`**: Latest and most accurate
- **`large-v3-turbo`**: Optimized for faster processing

Specify the model size when initializing the Transcriptor:

```python
transcriptor = Transcriptor(model_size="base")
```

The default model size is "base" if not specified.

## Requirements

To run this project, you need Python 3.7+ and the following packages:

```plaintext
- openai-whisper
- pyannote.audio
- librosa
- tqdm
- python-dotenv
- termcolor
- pydub
- SpeechRecognition
- pyaudio
- tabulate
- soundfile
- torch
- numpy
- transformers
- gradio
```

Install the required packages using:

```bash
pip install -r requirements.txt
```

## Setup

1. **Clone the repository**:
   ```bash
   git clone https://github.com/your-username/audio-transcription-tool.git
   cd audio-transcription-tool
   ```

2. **Install the required packages**:
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up your environment variables**:
   - Create a `.env` file in the root directory.
   - Add your Hugging Face token:
     ```plaintext
     HF_TOKEN=your_hugging_face_token_here
     ```

## Usage

### Basic Example

Here's how to use the Transcriptor class to transcribe an audio file:

```python
from pyscript import Transcriptor

# Initialize the Transcriptor
transcriptor = Transcriptor()

# Transcribe an audio file
transcription = transcriptor.transcribe_audio("/path/to/audio")

# Interactively name speakers
transcription.get_name_speakers()

# Save the transcription
transcription.save()
```

### Audio Processing Example

Use the AudioProcessor class to preprocess your audio files:

```python
from pyscript import AudioProcessor

# Load an audio file
audio = AudioProcessor("/path/to/audio.mp3")

# Display audio details
audio.display_details()

# Convert to WAV format and resample to 16000 Hz
audio.convert_to_wav()

# Display updated audio details
audio.display_changes()
```

### Transcribing an Existing Audio File or Recording

To transcribe an audio file or record and transcribe audio, use the demo application provided in `demo.py`:

```bash
python demo.py
```

## Key Components

### Transcriptor

The `Transcriptor` class (in `pyscript/transcriptor.py`) is the core of the transcription process. It handles:

- Loading the Whisper model
- Setting up the diarization pipeline
- Processing audio files
- Performing transcription and diarization

### AudioProcessor

The `AudioProcessor` class (in `pyscript/audio_processing.py`) manages audio file preprocessing, including:

- Loading audio files
- Resampling
- Converting to WAV format
- Displaying audio file details and changes
- Audio enhancement (noise reduction, voice enhancement, volume boost)

### AudioRecording

The `audio_recording.py` module provides functions for recording audio from a microphone, checking input devices, and saving audio files.

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository
2. Create a new branch: `git checkout -b feature-branch-name`
3. Make your changes and commit them: `git commit -m 'Add some feature'`
4. Push to the branch: `git push origin feature-branch-name`
5. Submit a pull request

## Acknowledgments

- OpenAI for the Whisper model
- PyAnnote for the speaker diarization pipeline
- All contributors and users of this project