--- license: mit language: - en tags: - speaker - speaker_Recognition - gender - voicebased - ai - ml --- # AuthEcho_Project This project contains well-trained deep learning models to predict the **Speaker** and their **Gender**. The repository offers a **Speaker and Gender Prediction System** built using **TensorFlow**, **Librosa**, and **Gradio**. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold. ## Features - Predicts the top 3 speakers from an audio file. - Determines the gender of the speaker. - Identifies unknown speakers with a confidence threshold. - Provides a Gradio interface for easy testing. ## Getting Started ### Prerequisites To run this application, you need: - **Python**: Version 3.8 or higher - Required Python libraries: - `tensorflow` - `numpy` - `librosa` - `gradio` - `scikit-learn` Install the required libraries with: ``` pip install tensorflow numpy librosa gradio scikit-learn ``` ### Installation 1. **Clone the Repository**: ``` git clone https://github.com/your-username/speaker-gender-prediction.git cd speaker-gender-prediction ``` 2. **Add Pre-Trained Models and Label Encoders**: Place the following files in the repository's root directory: - `lstm_speaker_model.h5`: Pre-trained speaker recognition model. - `lstm_gender_model.h5`: Pre-trained gender prediction model. - `lstm_speaker_label.pkl`: Label encoder for speaker classes. - `lstm_gender_label.pkl`: Label encoder for gender classes. ### Usage Run the application using: ``` python app.py ``` ### Gradio Interface The Gradio interface allows you to: - **Upload** an audio file or **record** audio directly. - Predict the **top 3 speakers** and their probabilities. - Determine the **gender** of the speaker. - Detect and classify **unknown speakers** using confidence thresholds. ## Project Structure ``` . ├── app.py # Main application file ├── models/lstm_speaker_model.h5 # Pre-trained speaker model (to be added) ├── models/lstm_gender_model.h5 # Pre-trained gender model (to be added) ├── models/lstm_speaker_label.pkl # Speaker label encoder (to be added) ├── models/lstm_gender_label.pkl # Gender label encoder (to be added) ├── requirements.txt # Python dependencies └── README.md # Project documentation ``` ## Example Output ### Top 3 Predicted Speakers: ``` The top 3 predicted speakers are: Speaker 1: 85.23% Speaker 2: 10.12% Speaker 3: 4.65% The predicted gender is: Male ``` ### Unknown Speaker: ``` The top 3 predicted speakers are: Unknown: 45.23% The predicted gender is: Unknown ``` ## How It Works 1. **Feature Extraction**: - Extracts **MFCCs**, **chroma features**, and **spectral contrast** from the input audio file using `librosa`. 2. **Speaker and Gender Models**: - **Speaker Model**: A pre-trained LSTM model classifies the speaker based on extracted features. - **Gender Model**: A separate LSTM model determines the gender. 3. **Unknown Detection**: - If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown." ## Roadmap - Add support for real-time audio predictions. - Improve unknown speaker detection using open-set recognition techniques. - Expand the dataset for more robust gender classification. ## Contributing Contributions are welcome! To contribute: 1. Fork the repository. 2. Create a feature branch (`git checkout -b feature-branch-name`). 3. Commit your changes (`git commit -m "Add new feature"`). 4. Push to the branch (`git push origin feature-branch-name`). 5. Open a Pull Request. ## License This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details. ## Acknowledgments - **TensorFlow**: For building the deep learning models. - **Librosa**: For audio processing and feature extraction. - **Gradio**: For creating the user interface.