Spaces:
Runtime error
Runtime error
File size: 2,403 Bytes
4adab6f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
title: "DocBot: Smart Document ChatBot"
emoji: π€
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: "0.87.0"
app_file: app.py
pinned: false
---
# π€ DocBot: Smart Document ChatBot
DocBot is an intelligent document processing application with a chatbot interface. It can process various types of documents, including PDFs and images, extract essential information, and enable user interaction through a chat interface.
## βοΈ Features
- **Document Upload**: Upload PDF, PNG, JPG, or JPEG files for processing.
- **Text Extraction**: Extract text content from uploaded documents.
- **Image Processing**: Convert PDF documents to images and extract text from images.
- **Chatbot Interface**: Interact with the document through a chatbot interface powered by Groq.
- **Natural Language Understanding**: Utilizes spaCy for natural language processing.
- **Dynamic Progress Bar**: Visual feedback on document processing progress.
- **Error Handling**: Provides error messages for any processing failures.
## βοΈ Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/docbot.git
```
2. Install the required Python packages:
```bash
pip install -r requirements.txt
```
3. Set up the environment variables:
Create a `.env` file in the root directory and add the following:
```dotenv
GROQ_API_KEY='your_groq_api_key'
```
4. Run the Streamlit app:
```bash
streamlit run app.py
```
## π Usage
1. Run the Streamlit app using the provided installation instructions.
2. Upload your document using the file uploader.
3. Wait for the document to be processed.
4. Interact with the document by asking questions in the chatbot interface.
## π» Technologies Used
- [Streamlit](https://streamlit.io/) - For building the interactive web application.
- [PyPDF2](https://pythonhosted.org/PyPDF2/) - For PDF document processing.
- [pdf2image](https://github.com/Belval/pdf2image) - For converting PDFs to images.
- [PyMuPDF](https://pypi.org/project/PyMuPDF/) - For PDF document rendering.
- [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - For extracting text from images.
- [spaCy](https://spacy.io/) - For natural language processing.
- [Groq](https://github.com/groq/groq-py) - For AI-powered chatbot interaction.
- [Pillow](https://python-pillow.org/) - For image processing.
|