deepseek-chat / README.md
chalisesagun's picture
Update README.md
f8c5d54 verified

A newer version of the Streamlit SDK is available: 1.43.2

Upgrade
metadata
title: Deepseek RAG Chat Bot
emoji: πŸ“ˆ
colorFrom: red
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Deepseek-RAG-Chat-Bot

RAG-Powered Chatbot with Streamlit

This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.


Features

  • Upload PDF documents and process them into chunks for efficient querying.
  • Generate semantic embeddings using sentence-transformers.
  • Store embeddings in a FAISS vector database for efficient retrieval.
  • Use the DeepSeek API for question-answering capabilities.
  • Built with Streamlit for an interactive and user-friendly UI.

Requirements

  • Python 3.8 or higher

Dependencies

Install the required Python libraries:

streamlit==1.25.0
langchain==0.81.0
langchain-community==0.1.2
faiss-cpu==1.7.4
sentence-transformers==2.2.2
pypdf==3.8.1

To install all dependencies:

pip install -r requirements.txt

Setup and Usage

1. Clone the Repository

git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Run the Application

Run the Streamlit application:

streamlit run app.py

4. Interact with the Chatbot

  1. Enter your DeepSeek API Key in the provided input field.
  2. Upload a PDF document.
  3. Ask questions about the content of the document.

Project Structure

.
β”œβ”€β”€ app.py              # Main application code
β”œβ”€β”€ requirements.txt    # List of dependencies
β”œβ”€β”€ README.md           # Documentation

Key Technologies Used

  1. Streamlit:

    • For building a user-friendly web interface.
  2. LangChain:

    • For document loading, text splitting, and RAG pipeline.
  3. FAISS:

    • For storing and querying vector embeddings.
  4. Sentence Transformers:

    • For generating semantic embeddings of text chunks.
  5. PyPDF:

    • For parsing PDF files.
  6. DeepSeek API:

    • For question-answering capabilities.

How It Works

  1. PDF Upload:

    • The user uploads a PDF document.
    • The document is split into manageable text chunks.
  2. Embeddings Generation:

    • Semantic embeddings are generated using sentence-transformers.
  3. Vector Storage:

    • The embeddings are stored in a FAISS vector database for efficient retrieval.
  4. Question Answering:

    • The user asks a question about the uploaded document.
    • The RAG pipeline retrieves relevant chunks and generates a response using the DeepSeek API.

Troubleshooting

  • Error: pypdf package not found Ensure pypdf is installed. Run:

    pip install pypdf
    
  • Error: langchain-community module not found Ensure langchain-community is installed. Run:

    pip install langchain-community
    
  • Reprocessing PDF on Every Query This issue is resolved by using st.session_state to persist the processed vector_store.


Future Improvements

  1. Add support for multiple file uploads.
  2. Integrate additional language models.
  3. Enhance the UI with better visualization of document content.
  4. Add support for other document formats (e.g., Word, TXT).

License

This project is licensed under the MIT License. See the LICENSE file for more details.


Contributions

Contributions are welcome! Feel free to fork the repository and submit a pull request.


Contact

For any queries or support, please contact:


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference