AuthEcho_Project

This project contains well-trained deep learning models to predict the Speaker and their Gender.

The repository offers a Speaker and Gender Prediction System built using TensorFlow, Librosa, and Gradio. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.

Features

  • Predicts the top 3 speakers from an audio file.
  • Determines the gender of the speaker.
  • Identifies unknown speakers with a confidence threshold.
  • Provides a Gradio interface for easy testing.

Getting Started

Prerequisites

To run this application, you need:

  • Python: Version 3.8 or higher
  • Required Python libraries:
    • tensorflow
    • numpy
    • librosa
    • gradio
    • scikit-learn

Install the required libraries with:

pip install tensorflow numpy librosa gradio scikit-learn

Installation

  1. Clone the Repository:
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
  1. Add Pre-Trained Models and Label Encoders:

Place the following files in the repository's root directory:

  • lstm_speaker_model.h5: Pre-trained speaker recognition model.
  • lstm_gender_model.h5: Pre-trained gender prediction model.
  • lstm_speaker_label.pkl: Label encoder for speaker classes.
  • lstm_gender_label.pkl: Label encoder for gender classes.

Usage

Run the application using:

python app.py

Gradio Interface

The Gradio interface allows you to:

  • Upload an audio file or record audio directly.
  • Predict the top 3 speakers and their probabilities.
  • Determine the gender of the speaker.
  • Detect and classify unknown speakers using confidence thresholds.

Project Structure

.
β”œβ”€β”€ app.py                # Main application file
β”œβ”€β”€ models/lstm_speaker_model.h5   # Pre-trained speaker model (to be added)
β”œβ”€β”€ models/lstm_gender_model.h5    # Pre-trained gender model (to be added)
β”œβ”€β”€ models/lstm_speaker_label.pkl  # Speaker label encoder (to be added)
β”œβ”€β”€ models/lstm_gender_label.pkl   # Gender label encoder (to be added)
β”œβ”€β”€ requirements.txt        # Python dependencies
└── README.md               # Project documentation

Example Output

Top 3 Predicted Speakers:

The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%

The predicted gender is: Male

Unknown Speaker:

The top 3 predicted speakers are:
Unknown: 45.23%

The predicted gender is: Unknown

How It Works

  1. Feature Extraction:

    • Extracts MFCCs, chroma features, and spectral contrast from the input audio file using librosa.
  2. Speaker and Gender Models:

    • Speaker Model: A pre-trained LSTM model classifies the speaker based on extracted features.
    • Gender Model: A separate LSTM model determines the gender.
  3. Unknown Detection:

    • If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."

Roadmap

  • Add support for real-time audio predictions.
  • Improve unknown speaker detection using open-set recognition techniques.
  • Expand the dataset for more robust gender classification.

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature-branch-name).
  3. Commit your changes (git commit -m "Add new feature").
  4. Push to the branch (git push origin feature-branch-name).
  5. Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • TensorFlow: For building the deep learning models.
  • Librosa: For audio processing and feature extraction.
  • Gradio: For creating the user interface.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.