HarshanaLF commited on
Commit
9b066ba
1 Parent(s): 8deae53
Files changed (1) hide show
  1. README.md +98 -1
README.md CHANGED
@@ -10,4 +10,101 @@ pinned: false
10
  license: afl-3.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: afl-3.0
11
  ---
12
 
13
+ # H GPT4o - AI Assistant
14
+
15
+ ## Overview
16
+
17
+ **H GPT4o** is an AI-powered assistant that combines rich conversational abilities with advanced capabilities, such as image generation, web search, and Q&A with images. Built with cutting-edge AI models, it provides users with an engaging and powerful tool to explore creativity, gather information, and solve problems.
18
+
19
+ ## Model Description and Usage
20
+
21
+ ### 1. **Gemma (Mistral-7B-Instruct-v0.3)**
22
+
23
+ - **Purpose**: Used for general conversation and generating responses to user queries.
24
+ - **Usage**: This model processes text-based inputs and generates human-like responses. It also orchestrates function calls for specific tasks like web search, image generation, and image-based Q&A.
25
+
26
+ ### 2. **Mixtral (Nous-Hermes-2-Mixtral-8x7B-DPO)**
27
+
28
+ - **Purpose**: Handles complex queries, especially those requiring web-based information.
29
+ - **Usage**: Mixtral is responsible for summarizing web search results and generating detailed and structured responses based on external data.
30
+
31
+ ### 3. **LLaMA (Meta-Llama-3-8B-Instruct)**
32
+
33
+ - **Purpose**: Acts as a fallback model for text generation tasks.
34
+ - **Usage**: When other models fail to generate a response, LLaMA is used to continue the conversation, ensuring uninterrupted interaction.
35
+
36
+ ### 4. **Yi-1.5 (34B-Chat)**
37
+
38
+ - **Purpose**: Provides diverse and creative responses.
39
+ - **Usage**: Yi-1.5 enhances the assistant's ability to reply like a human friend with a friendly tone, short forms, and emojis.
40
+
41
+ ### 5. **LLaVA (llava-interleave-qwen-0.5b-hf)**
42
+
43
+ - **Purpose**: Used for image-based Q&A and understanding visual content.
44
+ - **Usage**: LLaVA processes and analyzes images, enabling the assistant to answer questions related to the visual content provided by the user.
45
+
46
+ ## Pipeline Description
47
+
48
+ ### Input Processing
49
+
50
+ H GPT4o distinguishes between different input types (text, images) and processes them accordingly:
51
+
52
+ - **Text Inputs**: Direct user queries or prompts are sent to the `respond` function, which handles conversation flow and decides if additional functions like web search or image generation are needed.
53
+ - **Image Inputs**: If an image is provided, the system utilizes LLaVA to analyze the image in context with the accompanying text. The assistant can then answer questions related to the visual content.
54
+
55
+ ### Function Call Management
56
+
57
+ The assistant has access to several function calls to extend its capabilities:
58
+
59
+ - **Web Search**: Initiated when the query requires external information, such as current events or detailed topics not covered by the model’s knowledge base.
60
+ - **Image Generation**: Generates images based on user prompts, leveraging powerful text-to-image models.
61
+ - **Image Q&A**: Answers questions related to images provided by the user.
62
+
63
+ ### Conversational Flow
64
+
65
+ 1. **Initial Response**: The conversation begins with Gemma handling general queries and responses.
66
+ 2. **Function Execution**: Depending on the query, the assistant may call a specific function (e.g., web search, image generation).
67
+ 3. **Web Search Integration**: If a web search is required, the Mixtral model processes and summarizes the results.
68
+ 4. **Image Generation**: Image generation requests are handled by the `image_gen` function, which leverages external APIs or models.
69
+ 5. **Fallbacks**: If primary models fail to provide a response, the LLaMA model is used to continue the conversation.
70
+ 6. **Final Output**: The response, along with any generated images or information, is returned to the user.
71
+
72
+ ### Distinguishing Inputs
73
+
74
+ - **Text Inputs**: Any input in plain text is treated as a user query or command, processed by the appropriate text-generation model.
75
+ - **Image Inputs**: Files uploaded by the user are identified as images. The system automatically routes these to LLaVA for analysis. The combination of text and image is used to create a context-specific response.
76
+
77
+ ### Example Usage
78
+
79
+ - **Text-Based Query**: "What is the latest trend in AI technology?"
80
+ - **Image-Based Query**: "Can you describe the content of this image?" (with an image file attached)
81
+ - **Image Generation**: "Generate an image of a futuristic city at sunset."
82
+
83
+ ## How It Works
84
+
85
+ 1. **User Interaction**: Users interact with H GPT4o through a chat interface, where they can input text and upload images.
86
+ 2. **Input Processing**: The system processes the input to determine the type (text or image) and routes it through the appropriate pipeline.
87
+ 3. **Model Execution**: Based on the input type, the system selects the relevant model and executes the necessary function (e.g., web search, image generation, image analysis).
88
+ 4. **Response Generation**: The system generates a response, which may include text, images, or both, and displays it to the user.
89
+
90
+ ## Installation
91
+
92
+ Ensure you have the following dependencies installed:
93
+
94
+ ```bash
95
+ pip install gradio transformers huggingface-hub requests beautifulsoup4 PIL
96
+ ```
97
+
98
+ ## Running the Application
99
+
100
+ To run the application, simply execute the following command:
101
+
102
+ ```bash
103
+ python app.py
104
+ ```
105
+
106
+ This will start the Gradio interface, allowing users to interact with H GPT4o.
107
+
108
+ ## Conclusion
109
+
110
+ H GPT4o offers a versatile and interactive AI assistant capable of handling a wide range of tasks, from general conversation to advanced image analysis. Whether you're looking to generate images, search the web, or explore creative ideas, H GPT4o is designed to be your go-to assistant for all things AI.