--- title: Multi Modal Emotion Recognition emoji: 📈 colorFrom: gray colorTo: blue sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false license: mit --- # Multi Modal Emotion Recognition 📈 This application allows users to analyze emotions from videos using state-of-the-art models for both audio and visual content. You can upload videos (maximum length of 2 minutes) to extract emotions from both speech and facial expressions in real-time. ## Features: - **Audio Emotion Detection:** Uses OpenAI's Whisper model for transcription and Cardiff NLP's RoBERTa model for emotion recognition in text. - **Visual Emotion Analysis:** Leverages Salesforce's BLIP model for image captioning and J-Hartmann's DistilRoBERTa for visual emotion recognition. ## Instructions: 1. Upload a video file (maximum length: **2 minutes**). 2. The app will analyze both the audio and visual components of the video to extract and display emotions in real-time. ## Models Used: The models have been handpicked after numerous trials and are optimized for this task. Below are the models and the corresponding research papers: 1. **Cardiff NLP RoBERTa for Emotion Recognition from Text:** - [Model: cardiffnlp/twitter-roberta-base-emotion](https://huggingface.co/cardiffnlp/twitter-roberta-base-emotion) - [Paper: RoBERTa Sentiment & Emotion Analysis](https://arxiv.org/pdf/2010.12421) 2. **Salesforce BLIP for Image Captioning and Visual Emotion Analysis:** - [Model: Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) - [Paper: BLIP - Bootstrapping Language-Image Pre-training](https://arxiv.org/abs/2201.12086) 3. **J-Hartmann DistilRoBERTa for Emotion Recognition from Images:** - [Model: j-hartmann/emotion-english-distilroberta-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base) 4. **OpenAI Whisper for Speech-to-Text Transcription:** - [Model: openai/whisper-base](https://huggingface.co/openai/whisper-base) - [Paper: Whisper - Speech Recognition](https://arxiv.org/abs/2212.04356) These models were selected based on extensive trials to ensure the best performance for this multimodal emotion recognition task. ## Access the App: You can try the app [here](https://huggingface.co/spaces/Pradheep1647/multi-modal-emotion-recognition). ## License: This project is licensed under the MIT License.