Spaces:
Running
Running
File size: 7,819 Bytes
9e62d7d c01e389 9e62d7d ca2de89 9e62d7d ca2de89 9e62d7d 82691e5 9e62d7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# Import necessary libraries
import whisper
import os
from gtts import gTTS
import gradio as gr
from groq import Groq
import time
# Load Whisper tiny model for faster transcription
model = whisper.load_model("tiny")
# Set up Groq API client (ensure GROQ_API_KEY is set in your environment)
GROQ_API_KEY = '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'
client = Groq(api_key=GROQ_API_KEY)
# Function to get the LLM response from Groq with error handling and timing
def get_llm_response(user_input):
try:
start_time = time.time() # Start time to track API delay
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": user_input}],
model="llama3-8b-8192", # Replace with your desired model
)
response_time = time.time() - start_time # Calculate response time
# If it takes too long, return a warning
if response_time > 10: # You can adjust the timeout threshold
return "The response took too long, please try again."
return chat_completion.choices[0].message.content
except Exception as e:
return f"Error in LLM response: {str(e)}"
# Function to convert text to speech using gTTS
def text_to_speech(text, output_audio="output_audio.mp3"):
try:
tts = gTTS(text)
tts.save(output_audio)
return output_audio
except Exception as e:
return f"Error in Text-to-Speech: {str(e)}"
# Function for Text to Voice
def text_to_voice(user_text, voice="en"):
output_audio = text_to_speech(user_text)
return output_audio # Return only audio response
# Main chatbot function to handle audio or text input and output
def chatbot(audio=None, user_text=None, voice="en"):
try:
# Step 1: If audio is provided, transcribe the audio using Whisper
if audio:
result = model.transcribe(audio)
user_text = result["text"]
# Check if transcription is empty
if not user_text.strip():
return "No transcription found. Please try again.", None
# Step 2: Get LLM response from Groq
response_text = get_llm_response(user_text)
# Step 3: Convert the response text to speech
if response_text.startswith("Error"):
return response_text, None
output_audio = text_to_speech(response_text)
if output_audio.startswith("Error"):
return output_audio, None
return response_text, output_audio
except Exception as e:
return f"Error in chatbot processing: {str(e)}", None
# Define the About app section
def about_app():
about_text = """
# About Voicesy AI
**Voicesy AI** is a cutting-edge real-time chatbot and voice conversion application developed by **Hamaad Ayub Khan**. This innovative app combines advanced artificial intelligence technologies to provide users with a seamless interaction experience through both voice and text.
## Purpose
Voicesy AI is designed to facilitate easy communication by allowing users to convert spoken language into text and vice versa. The app is particularly beneficial for individuals who prefer voice interaction or have difficulties typing. Whether you're on the go or need assistance with accessibility, Voicesy AI makes communication effortless and engaging.
## Features
- **Voice-to-Voice Interaction**: Users can engage in conversations by speaking, and the app will transcribe audio input into text, process it, and provide a spoken response.
- **Text-to-Speech Conversion**: Users can type their messages, which the app will convert into speech, allowing for easy listening and understanding.
- **Language Support**: The app supports multiple voice options and languages, catering to a diverse user base.
- **Intelligent Responses**: Powered by the latest language models, Voicesy AI offers relevant and context-aware responses, enhancing the interaction experience.
## Technologies Used
- **Whisper**: An automatic speech recognition (ASR) model developed by OpenAI, Whisper enables accurate transcription of spoken language into text, ensuring that voice input is understood correctly.
- **gTTS (Google Text-to-Speech)**: This library is utilized for converting text responses generated by the AI into spoken audio, making it possible for users to listen to responses in real time.
- **Groq**: This powerful AI model integration allows Voicesy AI to generate intelligent conversational responses based on user input, creating a more engaging user experience.
- **Gradio**: The app is built using Gradio, a user-friendly framework that simplifies the creation of web-based interfaces for machine learning applications. Gradio enables rapid prototyping and easy deployment of the app, ensuring a smooth user experience.
## Development
Voicesy AI was developed with a focus on accessibility and user experience. Hamaad Ayub Khan utilized a variety of programming languages and frameworks to bring this project to life, including Python for backend development and Gradio for frontend interface design. Continuous testing and refinement were performed to ensure the app operates efficiently and effectively.
## Disclaimer
While Voicesy AI leverages advanced AI technologies, it is important to note that the AI may make mistakes. Users are encouraged to verify critical information and use the app as a supportive tool rather than a definitive source.
## Contact
For any inquiries or feedback regarding Voicesy AI, please reach out via the following social media links:
- [Instagram](https://instagram.com/hamaadayubkhan)
- [GitHub](https://github.com/hakgs1234)
- [LinkedIn](https://www.linkedin.com/in/hamaadayubkhan)
**Thank you for using Voicesy AI!**
"""
return about_text
# Gradio interface for real-time interaction with voice selection
with gr.Blocks(css="style.css") as iface: # Include the CSS file here
gr.Markdown("# Voicesy AI")
# Tab for Voice to Voice
with gr.Tab("Voice to Voice"):
audio_input = gr.Audio(type="filepath", label="Input Audio (optional)") # Input from mic or file
text_input = gr.Textbox(placeholder="Type your message here...", label="Input your Text To Interact with LLM")
voice_selection = gr.Dropdown(choices=["en", "en-uk", "en-au", "fr", "de", "es"], label="Select Voice", value="en") # Voice selection
output_text = gr.Textbox(label="AI Response")
output_audio = gr.Audio(type="filepath", label="AI Audio Response")
# Button for Voice to Voice
voice_to_voice_button = gr.Button("Voice to Voice")
# Define button actions
voice_to_voice_button.click(chatbot, inputs=[audio_input, text_input, voice_selection], outputs=[output_text, output_audio])
# Tab for Text to Speech
with gr.Tab("Text to Speech"):
text_input = gr.Textbox(placeholder="Type your message here...", label="Input Text")
voice_selection = gr.Dropdown(choices=["en", "en-uk", "en-au", "fr", "de", "es"], label="Select Voice", value="en")
output_audio = gr.Audio(type="filepath", label="AI Audio Response")
# Button to convert text to speech
convert_button = gr.Button("Convert to Speech")
convert_button.click(text_to_voice, inputs=[text_input, voice_selection], outputs=[output_audio])
# Tab for About App
with gr.Tab("About App"):
about = gr.Markdown(about_app())
# Set up the footer
gr.Markdown("Voicesy AI | [Instagram](https://instagram.com/hamaadayubkhan) | [GitHub](https://github.com/hakgs1234) | [LinkedIn](https://www.linkedin.com/in/hamaadayubkhan)")
# Launch the Gradio app
iface.launch()
|