Spaces:
Paused
Paused
Shreyas094
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,66 @@ pinned: false
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
+
# AI-powered Web Search and PDF Chat Assistant
|
14 |
+
|
15 |
+
This project combines the power of large language models with web search capabilities and PDF document analysis to create a versatile chat assistant. Users can interact with their uploaded PDF documents or leverage web search to get informative responses to their queries.
|
16 |
+
|
17 |
+
## Features
|
18 |
+
|
19 |
+
- **PDF Document Chat**: Upload and interact with multiple PDF documents.
|
20 |
+
- **Web Search Integration**: Option to use web search for answering queries.
|
21 |
+
- **Multiple AI Models**: Choose from a selection of powerful language models.
|
22 |
+
- **Customizable Responses**: Adjust temperature and API call settings for fine-tuned outputs.
|
23 |
+
- **User-friendly Interface**: Built with Gradio for an intuitive chat experience.
|
24 |
+
- **Document Selection**: Choose which uploaded documents to include in your queries.
|
25 |
+
|
26 |
+
## How It Works
|
27 |
+
|
28 |
+
1. **Document Processing**:
|
29 |
+
- Upload PDF documents using either PyPDF or LlamaParse.
|
30 |
+
- Documents are processed and stored in a FAISS vector database for efficient retrieval.
|
31 |
+
|
32 |
+
2. **Embedding**:
|
33 |
+
- Utilizes HuggingFace embeddings (default: 'sentence-transformers/all-mpnet-base-v2') for document indexing and query matching.
|
34 |
+
|
35 |
+
3. **Query Processing**:
|
36 |
+
- For PDF queries, relevant document sections are retrieved from the FAISS database.
|
37 |
+
- For web searches, results are fetched using the DuckDuckGo search API.
|
38 |
+
|
39 |
+
4. **Response Generation**:
|
40 |
+
- Queries are processed using the selected AI model (options include Mistral, Mixtral, and others).
|
41 |
+
- Responses are generated based on the retrieved context (from PDFs or web search).
|
42 |
+
|
43 |
+
5. **User Interaction**:
|
44 |
+
- Users can chat with the AI, asking questions about uploaded documents or general queries.
|
45 |
+
- The interface allows for adjusting model parameters and switching between PDF and web search modes.
|
46 |
+
|
47 |
+
## Setup and Usage
|
48 |
+
|
49 |
+
1. Install the required dependencies (list of dependencies to be added).
|
50 |
+
2. Set up the necessary API keys and tokens in your environment variables.
|
51 |
+
3. Run the main script to launch the Gradio interface.
|
52 |
+
4. Upload PDF documents using the file input at the top of the interface.
|
53 |
+
5. Select documents to query using the checkboxes.
|
54 |
+
6. Toggle between PDF chat and web search modes as needed.
|
55 |
+
7. Adjust temperature and number of API calls to fine-tune responses.
|
56 |
+
8. Start chatting and asking questions!
|
57 |
+
|
58 |
+
## Models
|
59 |
+
|
60 |
+
The project supports multiple AI models, including:
|
61 |
+
- mistralai/Mistral-7B-Instruct-v0.3
|
62 |
+
- mistralai/Mixtral-8x7B-Instruct-v0.1
|
63 |
+
- meta/llama-3.1-8b-instruct
|
64 |
+
- mistralai/Mistral-Nemo-Instruct-2407
|
65 |
+
|
66 |
+
## Future Improvements
|
67 |
+
|
68 |
+
- Integration of more embedding models for improved performance.
|
69 |
+
- Enhanced PDF parsing capabilities.
|
70 |
+
- Support for additional file formats beyond PDF.
|
71 |
+
- Improved caching for faster response times.
|
72 |
+
|
73 |
+
## Contribution
|
74 |
+
|
75 |
+
Contributions to this project are welcome! Please feel free to submit issues or pull requests on the project's GitHub repository.
|