Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.21.0
title: Case Study1
emoji: 🦀
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
QuickTranscribe
This is a Python-based web application that allows users to upload audio files or use a microphone to transcribe audio into text using Automatic Speech Recognition (ASR). The app also provides additional details like RAM utilization during the transcription process. It uses the "openai/whisper-large-v3" model from Hugging Face for transcription.
Features
- Microphone and File Upload Support: Users can transcribe audio from either a microphone or an uploaded audio file.
- Local and API-based Transcription: Option to use a local model or an API for transcription.
- RAM Utilization Display: Shows how much RAM was utilized during the transcription process.
- Real-time Speech-to-Text Transcription: Converts audio to text in real-time with time-tracking.
- Model Used: The application uses the "openai/whisper-large-v3" model for transcription, which is part of Hugging Face's library.
Installation
Prerequisites
- Python 3.x
psutil
library for RAM usage trackinggradio
for the web interfacetransformers
library for the ASR pipelinehuggingface_hub
for API access
You can install the required dependencies using pip:
pip install psutil gradio transformers huggingface_hub
Clone the repository
git clone https://github.com/VenkateshRoshan/MLOPs-CaseStudy1.git
cd MLOPs-CaseStudy1
Usage
Running the Application
To start the application, run the following command:
python app.py
This will launch a Gradio interface where you can choose to transcribe either using an uploaded audio file or the microphone input.
Options
- Microphone Input: Click on the "Microphone" tab to start recording and transcribe the audio.
- Audio File Upload: Use the "Audio File" tab to upload an audio file for transcription.
- Use API: Check the "Use API" checkbox if you want to use the Hugging Face API for transcription instead of the local pipeline.
Output
- Transcribed Text: The text transcribed from the uploaded or recorded audio will be displayed.
- Time Taken: The time taken for the transcription process is displayed.
- RAM Utilization: A text box shows the RAM usage details, including the amount of RAM used and the percentage of the total system RAM during the transcription process.
Example Output
Here’s an example of the displayed output:
- Transcribed Text: "This is an example transcription."
- Time Taken: "Using API it took: 12.34 seconds"
- RAM Utilization: "RAM Used: 0.56 GB (3.45%), Total RAM: 16.0 GB"
Future Enhancements
- GPU Integration: To address performance issues with CPU processing, integrating the product with Hugging Face’s GPU instances could significantly speed up transcription times, especially for longer audio files or real-time applications. Offering GPU as an option would provide a faster, more scalable solution for users who need high-speed transcription services.
- Batch Processing and Caching: Implementing batch processing or caching for repeated tasks (such as transcribing the same file multiple times) could reduce resource usage and improve performance. By grouping multiple audio files or requests together, the product could optimize processing times and reduce wait times for users.
- Enhanced User Interface Features: The user experience could be further enhanced by adding features like audio segmentation (to break up long audio files into smaller parts) and progress indicators during transcription. This would improve the usability of the product, especially for users transcribing lengthy recordings.