Tingusto commited on
Commit
1d95344
·
verified ·
1 Parent(s): 9e36430

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +198 -189
README.md CHANGED
@@ -1,190 +1,199 @@
1
- # Audio Transcription and Diarization Tool
2
-
3
- ## Overview
4
-
5
- This project provides a robust set of tools for transcribing audio files using the Whisper model and performing speaker diarization with PyAnnote. Users can process audio files, record audio, and save transcriptions with speaker identification.
6
-
7
- ## Table of Contents
8
- - [Features](#features)
9
- - [Requirements](#requirements)
10
- - [Setup](#setup)
11
- - [Usage](#usage)
12
- - [Basic Example](#basic-example)
13
- - [Audio Processing Example](#audio-processing-example)
14
- - [Transcribing an Existing Audio File or Recording](#transcribing-an-existing-audio-file-or-recording)
15
- - [Key Components](#key-components)
16
- - [Transcriptor](#transcriptor)
17
- - [AudioProcessor](#audioprocessor)
18
- - [AudioRecording](#audiorecording)
19
- - [Contributing](#contributing)
20
- - [Acknowledgments](#acknowledgments)
21
-
22
- ## Features
23
-
24
- - **Transcription**: Convert audio files in various formats to text (automatically converts to WAV).
25
- - **Speaker Diarization**: Identify different speakers in the audio.
26
- - **Speaker Retrieval**: Name speakers during transcription.
27
- - **Audio Recording**: Record audio directly from a microphone.
28
- - **Audio Preprocessing**: Includes resampling, format conversion, and audio enhancement.
29
- - **Multiple Model Support**: Choose from various Whisper model sizes.
30
-
31
- ## Supported Whisper Models
32
-
33
- This tool supports various Whisper model sizes, allowing you to balance accuracy and computational resources:
34
-
35
- - **`tiny`**: Fastest, lowest accuracy
36
- - **`base`**: Fast, good accuracy
37
- - **`small`**: Balanced speed and accuracy
38
- - **`medium`**: High accuracy, slower
39
- - **`large`**: High accuracy, resource-intensive
40
- - **`large-v1`**: Improved large model
41
- - **`large-v2`**: Further improved large model
42
- - **`large-v3`**: Latest and most accurate
43
- - **`large-v3-turbo`**: Optimized for faster processing
44
-
45
- Specify the model size when initializing the Transcriptor:
46
-
47
- ```python
48
- transcriptor = Transcriptor(model_size="base")
49
- ```
50
-
51
- The default model size is "base" if not specified.
52
-
53
- ## Requirements
54
-
55
- To run this project, you need Python 3.7+ and the following packages:
56
-
57
- ```plaintext
58
- - openai-whisper
59
- - pyannote.audio
60
- - librosa
61
- - tqdm
62
- - python-dotenv
63
- - termcolor
64
- - pydub
65
- - SpeechRecognition
66
- - pyaudio
67
- - tabulate
68
- - soundfile
69
- - torch
70
- - numpy
71
- - transformers
72
- - gradio
73
- ```
74
-
75
- Install the required packages using:
76
-
77
- ```bash
78
- pip install -r requirements.txt
79
- ```
80
-
81
- ## Setup
82
-
83
- 1. **Clone the repository**:
84
- ```bash
85
- git clone https://github.com/your-username/audio-transcription-tool.git
86
- cd audio-transcription-tool
87
- ```
88
-
89
- 2. **Install the required packages**:
90
- ```bash
91
- pip install -r requirements.txt
92
- ```
93
-
94
- 3. **Set up your environment variables**:
95
- - Create a `.env` file in the root directory.
96
- - Add your Hugging Face token:
97
- ```plaintext
98
- HF_TOKEN=your_hugging_face_token_here
99
- ```
100
-
101
- ## Usage
102
-
103
- ### Basic Example
104
-
105
- Here's how to use the Transcriptor class to transcribe an audio file:
106
-
107
- ```python
108
- from pyscript import Transcriptor
109
-
110
- # Initialize the Transcriptor
111
- transcriptor = Transcriptor()
112
-
113
- # Transcribe an audio file
114
- transcription = transcriptor.transcribe_audio("/path/to/audio")
115
-
116
- # Interactively name speakers
117
- transcription.get_name_speakers()
118
-
119
- # Save the transcription
120
- transcription.save()
121
- ```
122
-
123
- ### Audio Processing Example
124
-
125
- Use the AudioProcessor class to preprocess your audio files:
126
-
127
- ```python
128
- from pyscript import AudioProcessor
129
-
130
- # Load an audio file
131
- audio = AudioProcessor("/path/to/audio.mp3")
132
-
133
- # Display audio details
134
- audio.display_details()
135
-
136
- # Convert to WAV format and resample to 16000 Hz
137
- audio.convert_to_wav()
138
-
139
- # Display updated audio details
140
- audio.display_changes()
141
- ```
142
-
143
- ### Transcribing an Existing Audio File or Recording
144
-
145
- To transcribe an audio file or record and transcribe audio, use the demo application provided in `demo.py`:
146
-
147
- ```bash
148
- python demo.py
149
- ```
150
-
151
- ## Key Components
152
-
153
- ### Transcriptor
154
-
155
- The `Transcriptor` class (in `pyscript/transcriptor.py`) is the core of the transcription process. It handles:
156
-
157
- - Loading the Whisper model
158
- - Setting up the diarization pipeline
159
- - Processing audio files
160
- - Performing transcription and diarization
161
-
162
- ### AudioProcessor
163
-
164
- The `AudioProcessor` class (in `pyscript/audio_processing.py`) manages audio file preprocessing, including:
165
-
166
- - Loading audio files
167
- - Resampling
168
- - Converting to WAV format
169
- - Displaying audio file details and changes
170
- - Audio enhancement (noise reduction, voice enhancement, volume boost)
171
-
172
- ### AudioRecording
173
-
174
- The `audio_recording.py` module provides functions for recording audio from a microphone, checking input devices, and saving audio files.
175
-
176
- ## Contributing
177
-
178
- Contributions are welcome! Please follow these steps:
179
-
180
- 1. Fork the repository
181
- 2. Create a new branch: `git checkout -b feature-branch-name`
182
- 3. Make your changes and commit them: `git commit -m 'Add some feature'`
183
- 4. Push to the branch: `git push origin feature-branch-name`
184
- 5. Submit a pull request
185
-
186
- ## Acknowledgments
187
-
188
- - OpenAI for the Whisper model
189
- - PyAnnote for the speaker diarization pipeline
 
 
 
 
 
 
 
 
 
190
  - All contributors and users of this project
 
1
+ ---
2
+ title: Audio-Transcriptor
3
+ sdk: gradio
4
+ emoji: ⚡
5
+ colorFrom: green
6
+ colorTo: red
7
+ pinned: false
8
+ short_description: Audio transcription with speaker diarization using Whisper.
9
+ ---
10
+ # Audio Transcription and Diarization Tool
11
+
12
+ ## Overview
13
+
14
+ This project provides a robust set of tools for transcribing audio files using the Whisper model and performing speaker diarization with PyAnnote. Users can process audio files, record audio, and save transcriptions with speaker identification.
15
+
16
+ ## Table of Contents
17
+ - [Features](#features)
18
+ - [Requirements](#requirements)
19
+ - [Setup](#setup)
20
+ - [Usage](#usage)
21
+ - [Basic Example](#basic-example)
22
+ - [Audio Processing Example](#audio-processing-example)
23
+ - [Transcribing an Existing Audio File or Recording](#transcribing-an-existing-audio-file-or-recording)
24
+ - [Key Components](#key-components)
25
+ - [Transcriptor](#transcriptor)
26
+ - [AudioProcessor](#audioprocessor)
27
+ - [AudioRecording](#audiorecording)
28
+ - [Contributing](#contributing)
29
+ - [Acknowledgments](#acknowledgments)
30
+
31
+ ## Features
32
+
33
+ - **Transcription**: Convert audio files in various formats to text (automatically converts to WAV).
34
+ - **Speaker Diarization**: Identify different speakers in the audio.
35
+ - **Speaker Retrieval**: Name speakers during transcription.
36
+ - **Audio Recording**: Record audio directly from a microphone.
37
+ - **Audio Preprocessing**: Includes resampling, format conversion, and audio enhancement.
38
+ - **Multiple Model Support**: Choose from various Whisper model sizes.
39
+
40
+ ## Supported Whisper Models
41
+
42
+ This tool supports various Whisper model sizes, allowing you to balance accuracy and computational resources:
43
+
44
+ - **`tiny`**: Fastest, lowest accuracy
45
+ - **`base`**: Fast, good accuracy
46
+ - **`small`**: Balanced speed and accuracy
47
+ - **`medium`**: High accuracy, slower
48
+ - **`large`**: High accuracy, resource-intensive
49
+ - **`large-v1`**: Improved large model
50
+ - **`large-v2`**: Further improved large model
51
+ - **`large-v3`**: Latest and most accurate
52
+ - **`large-v3-turbo`**: Optimized for faster processing
53
+
54
+ Specify the model size when initializing the Transcriptor:
55
+
56
+ ```python
57
+ transcriptor = Transcriptor(model_size="base")
58
+ ```
59
+
60
+ The default model size is "base" if not specified.
61
+
62
+ ## Requirements
63
+
64
+ To run this project, you need Python 3.7+ and the following packages:
65
+
66
+ ```plaintext
67
+ - openai-whisper
68
+ - pyannote.audio
69
+ - librosa
70
+ - tqdm
71
+ - python-dotenv
72
+ - termcolor
73
+ - pydub
74
+ - SpeechRecognition
75
+ - pyaudio
76
+ - tabulate
77
+ - soundfile
78
+ - torch
79
+ - numpy
80
+ - transformers
81
+ - gradio
82
+ ```
83
+
84
+ Install the required packages using:
85
+
86
+ ```bash
87
+ pip install -r requirements.txt
88
+ ```
89
+
90
+ ## Setup
91
+
92
+ 1. **Clone the repository**:
93
+ ```bash
94
+ git clone https://github.com/your-username/audio-transcription-tool.git
95
+ cd audio-transcription-tool
96
+ ```
97
+
98
+ 2. **Install the required packages**:
99
+ ```bash
100
+ pip install -r requirements.txt
101
+ ```
102
+
103
+ 3. **Set up your environment variables**:
104
+ - Create a `.env` file in the root directory.
105
+ - Add your Hugging Face token:
106
+ ```plaintext
107
+ HF_TOKEN=your_hugging_face_token_here
108
+ ```
109
+
110
+ ## Usage
111
+
112
+ ### Basic Example
113
+
114
+ Here's how to use the Transcriptor class to transcribe an audio file:
115
+
116
+ ```python
117
+ from pyscript import Transcriptor
118
+
119
+ # Initialize the Transcriptor
120
+ transcriptor = Transcriptor()
121
+
122
+ # Transcribe an audio file
123
+ transcription = transcriptor.transcribe_audio("/path/to/audio")
124
+
125
+ # Interactively name speakers
126
+ transcription.get_name_speakers()
127
+
128
+ # Save the transcription
129
+ transcription.save()
130
+ ```
131
+
132
+ ### Audio Processing Example
133
+
134
+ Use the AudioProcessor class to preprocess your audio files:
135
+
136
+ ```python
137
+ from pyscript import AudioProcessor
138
+
139
+ # Load an audio file
140
+ audio = AudioProcessor("/path/to/audio.mp3")
141
+
142
+ # Display audio details
143
+ audio.display_details()
144
+
145
+ # Convert to WAV format and resample to 16000 Hz
146
+ audio.convert_to_wav()
147
+
148
+ # Display updated audio details
149
+ audio.display_changes()
150
+ ```
151
+
152
+ ### Transcribing an Existing Audio File or Recording
153
+
154
+ To transcribe an audio file or record and transcribe audio, use the demo application provided in `demo.py`:
155
+
156
+ ```bash
157
+ python demo.py
158
+ ```
159
+
160
+ ## Key Components
161
+
162
+ ### Transcriptor
163
+
164
+ The `Transcriptor` class (in `pyscript/transcriptor.py`) is the core of the transcription process. It handles:
165
+
166
+ - Loading the Whisper model
167
+ - Setting up the diarization pipeline
168
+ - Processing audio files
169
+ - Performing transcription and diarization
170
+
171
+ ### AudioProcessor
172
+
173
+ The `AudioProcessor` class (in `pyscript/audio_processing.py`) manages audio file preprocessing, including:
174
+
175
+ - Loading audio files
176
+ - Resampling
177
+ - Converting to WAV format
178
+ - Displaying audio file details and changes
179
+ - Audio enhancement (noise reduction, voice enhancement, volume boost)
180
+
181
+ ### AudioRecording
182
+
183
+ The `audio_recording.py` module provides functions for recording audio from a microphone, checking input devices, and saving audio files.
184
+
185
+ ## Contributing
186
+
187
+ Contributions are welcome! Please follow these steps:
188
+
189
+ 1. Fork the repository
190
+ 2. Create a new branch: `git checkout -b feature-branch-name`
191
+ 3. Make your changes and commit them: `git commit -m 'Add some feature'`
192
+ 4. Push to the branch: `git push origin feature-branch-name`
193
+ 5. Submit a pull request
194
+
195
+ ## Acknowledgments
196
+
197
+ - OpenAI for the Whisper model
198
+ - PyAnnote for the speaker diarization pipeline
199
  - All contributors and users of this project