H-GPT40 / README.md
HarshanaLF's picture
update
9b066ba

A newer version of the Gradio SDK is available: 5.0.1

Upgrade
metadata
title: H GPT40
emoji: 🐢
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.37.1
app_file: app.py
pinned: false
license: afl-3.0

H GPT4o - AI Assistant

Overview

H GPT4o is an AI-powered assistant that combines rich conversational abilities with advanced capabilities, such as image generation, web search, and Q&A with images. Built with cutting-edge AI models, it provides users with an engaging and powerful tool to explore creativity, gather information, and solve problems.

Model Description and Usage

1. Gemma (Mistral-7B-Instruct-v0.3)

  • Purpose: Used for general conversation and generating responses to user queries.
  • Usage: This model processes text-based inputs and generates human-like responses. It also orchestrates function calls for specific tasks like web search, image generation, and image-based Q&A.

2. Mixtral (Nous-Hermes-2-Mixtral-8x7B-DPO)

  • Purpose: Handles complex queries, especially those requiring web-based information.
  • Usage: Mixtral is responsible for summarizing web search results and generating detailed and structured responses based on external data.

3. LLaMA (Meta-Llama-3-8B-Instruct)

  • Purpose: Acts as a fallback model for text generation tasks.
  • Usage: When other models fail to generate a response, LLaMA is used to continue the conversation, ensuring uninterrupted interaction.

4. Yi-1.5 (34B-Chat)

  • Purpose: Provides diverse and creative responses.
  • Usage: Yi-1.5 enhances the assistant's ability to reply like a human friend with a friendly tone, short forms, and emojis.

5. LLaVA (llava-interleave-qwen-0.5b-hf)

  • Purpose: Used for image-based Q&A and understanding visual content.
  • Usage: LLaVA processes and analyzes images, enabling the assistant to answer questions related to the visual content provided by the user.

Pipeline Description

Input Processing

H GPT4o distinguishes between different input types (text, images) and processes them accordingly:

  • Text Inputs: Direct user queries or prompts are sent to the respond function, which handles conversation flow and decides if additional functions like web search or image generation are needed.
  • Image Inputs: If an image is provided, the system utilizes LLaVA to analyze the image in context with the accompanying text. The assistant can then answer questions related to the visual content.

Function Call Management

The assistant has access to several function calls to extend its capabilities:

  • Web Search: Initiated when the query requires external information, such as current events or detailed topics not covered by the model’s knowledge base.
  • Image Generation: Generates images based on user prompts, leveraging powerful text-to-image models.
  • Image Q&A: Answers questions related to images provided by the user.

Conversational Flow

  1. Initial Response: The conversation begins with Gemma handling general queries and responses.
  2. Function Execution: Depending on the query, the assistant may call a specific function (e.g., web search, image generation).
  3. Web Search Integration: If a web search is required, the Mixtral model processes and summarizes the results.
  4. Image Generation: Image generation requests are handled by the image_gen function, which leverages external APIs or models.
  5. Fallbacks: If primary models fail to provide a response, the LLaMA model is used to continue the conversation.
  6. Final Output: The response, along with any generated images or information, is returned to the user.

Distinguishing Inputs

  • Text Inputs: Any input in plain text is treated as a user query or command, processed by the appropriate text-generation model.
  • Image Inputs: Files uploaded by the user are identified as images. The system automatically routes these to LLaVA for analysis. The combination of text and image is used to create a context-specific response.

Example Usage

  • Text-Based Query: "What is the latest trend in AI technology?"
  • Image-Based Query: "Can you describe the content of this image?" (with an image file attached)
  • Image Generation: "Generate an image of a futuristic city at sunset."

How It Works

  1. User Interaction: Users interact with H GPT4o through a chat interface, where they can input text and upload images.
  2. Input Processing: The system processes the input to determine the type (text or image) and routes it through the appropriate pipeline.
  3. Model Execution: Based on the input type, the system selects the relevant model and executes the necessary function (e.g., web search, image generation, image analysis).
  4. Response Generation: The system generates a response, which may include text, images, or both, and displays it to the user.

Installation

Ensure you have the following dependencies installed:

pip install gradio transformers huggingface-hub requests beautifulsoup4 PIL

Running the Application

To run the application, simply execute the following command:

python app.py

This will start the Gradio interface, allowing users to interact with H GPT4o.

Conclusion

H GPT4o offers a versatile and interactive AI assistant capable of handling a wide range of tasks, from general conversation to advanced image analysis. Whether you're looking to generate images, search the web, or explore creative ideas, H GPT4o is designed to be your go-to assistant for all things AI.