Spaces:

Komal01
/

Streaming_RAG_chatbot

Running

App Files Files Community

Komal01 commited on Mar 28

Commit

ccfb9e1

verified ·

1 Parent(s): 617971b

Upload 8 files

Browse files

Files changed (8) hide show

.Dockerignore +2 -0
.opik.config +5 -0
Dockerfile +30 -0
README.md +127 -11
app.py +240 -0
requirements.txt +0 -0
start.sh +39 -0
streamlit_app.py +76 -0

.Dockerignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ .env
2	+ __pycache__/

.opik.config ADDED Viewed

	@@ -0,0 +1,5 @@

+[opik]
+url_override = https://www.comet.com/opik/api/
+workspace = komalgupta991000-gmail-com
+api_key = BX9OYn3NZBKuztCxL4XvMOeeI

Dockerfile ADDED Viewed

	@@ -0,0 +1,30 @@

+FROM python:3.11.4-slim-buster
+# Install curl and Ollama
+RUN apt-get update && apt-get install -y curl && \
+    curl -fsSL https://ollama.ai/install.sh | sh && \
+    apt-get clean && rm -rf /var/lib/apt/lists/*
+# Set up user and environment
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH="/home/user/.local/bin:$PATH"
+WORKDIR $HOME/app
+COPY --chown=user requirements.txt .
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+COPY . .
+COPY --chown=user . .
+# Make the start script executable
+RUN chmod +x start.sh
+# Expose FastAPI & Streamlit ports
+EXPOSE 7860 8501
+CMD ["./start.sh"]

README.md CHANGED Viewed

@@ -1,11 +1,127 @@
----
-title: Streaming RAG Chatbot
-emoji: 👀
-colorFrom: blue
-colorTo: yellow
-sdk: docker
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# AI Assistant API
+## 🚀 Overview
+This project is an AI-powered assistant that uses FastAPI and FAISS for retrieval-augmented generation (RAG). It processes user queries using a vector database and evaluates responses with Opik.
+## 🛠️ Features
+- Upload and manage datasets
+- Query AI assistant with domain-specific constraints
+- Use FAISS for efficient document retrieval
+- Evaluate LLM responses using Opik
+## 📽️ Demo Video
+[🎥 Click here to watch the demo](https://drive.google.com/file/d/10h4VnTm_y5SBczI6NnoTuqRxyq55HAn5/view?usp=sharing)
+## 📦 Installation
+### Install Ollama
+Ollama is required for this project. Follow these steps to install it:
+```bash
+# For macOS
+brew install ollama
+# For Linux
+curl -fsSL https://ollama.ai/install.sh | sh
+# Verify installation
+ollama --version
+# Windows
+You can download from web https://ollama.com/
+```
+# Clone and Setup the Project
+## Clone the repository
+```
+git clone https://github.com/Komal-99/cyfuture_bot.git
+```
+## Navigate to the project directory
+```
+cd cyfuture_bot
+```
+## Install dependencies
+```
+pip install -r requirements.txt  # For Python projects
+yarn install  # For JavaScript projects
+ ```
+🚀 Usage
+Start the Project
+Run the ```start.sh``` script to set up and launch the application:
+```
+chmod +x start.sh
+./start.sh
+```
+This script:
+Sets environment variables for optimization
+Starts Ollama in the background
+Pulls required models (deepseek-r1:7b, nomic-embed-text)
+Waits for Ollama to initialize
+### Launches the FastAPI server on http://127.0.0.1:7860
+###  Streamlit Application - http://127.0.0.1:8501
+## API Endpoints
+Upload Dataset
+```
+POST /upload_dataset/ #Upload an Excel dataset to be used for evaluation.
+```
+Run Evaluation
+```
+POST /run_evaluation/ #Evaluate the model's performance using Opik.
+```
+Query AI Assistant
+```
+GET /query/?input_text=your_question # Ask the assistant a question. The model retrieves relevant information and generates an answer based on indexed documents.
+```
+📂 Folder Structure
+```
+.
+├── AI_Agent/          # Datasource
+├── deepseek_cyfuture/ # DeepSeek Vector db
+├── .env               # Environment variables
+├── .gitignore         # Files to ignore in Git
+├── dataset.xlsx       # Sample dataset file
+├── Dockerfile         # Docker configuration
+├── requirements.txt   # Dependencies (Python projects)
+├── start.sh           # Startup script
+├── app.py             # Main application file
+├── README.md          # Project documentation
+```
+🤝 Contributing
+Contributions are welcome! Please follow these steps:
+Fork the repository
+Create a new branch (git checkout -b feature-branch)
+Commit your changes (git commit -m 'Add new feature')
+Push to the branch (git push origin feature-branch)
+Create a pull request
+📜 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+📬 Contact
+For questions or issues, reach out:
+GitHub: https://github.com/Komal-99

app.py ADDED Viewed

	@@ -0,0 +1,240 @@

+import os
+import re
+import pandas as pd
+import backoff
+import asyncio
+from datetime import datetime
+from dotenv import load_dotenv
+from langchain_ollama import OllamaEmbeddings, ChatOllama
+from langchain_community.vectorstores import FAISS
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.runnables import RunnablePassthrough
+from opik import Opik, track, evaluate
+from opik.evaluation.metrics import Hallucination, AnswerRelevance
+import litellm
+import opik
+from fastapi.responses import StreamingResponse
+from litellm.integrations.opik.opik import OpikLogger
+from litellm import completion, APIConnectionError
+from fastapi import FastAPI, UploadFile, File, HTTPException, Query, Response
+from langchain.document_loaders import PyMuPDFLoader, UnstructuredWordDocumentLoader
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+app = FastAPI()
+def initialize_opik():
+    opik_logger = OpikLogger()
+    litellm.callbacks = [opik_logger]
+    opik.configure(api_key=os.getenv("OPIK_API_KEY"),workspace=os.getenv("workspace"),force=True)
+# Initialize Opik and load environment variables
+load_dotenv()
+initialize_opik()
+# Initialize Opik Client
+dataset = Opik().get_or_create_dataset(
+    name="Cyfuture_faq",
+    description="Dataset on IGL FAQ",
+)
+@app.post("/upload_dataset/")
+def upload_dataset(file: UploadFile = File(...)):
+    try:
+        df = pd.read_excel(file.file)
+        dataset.insert(df.to_dict(orient='records'))
+        return {"message": "Dataset uploaded successfully"}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+# To use the uploaded dataset in the evaluation task manually
+def upload_dataset():
+    df = pd.read_excel("dataset.xlsx")
+    dataset.insert(df.to_dict(orient='records'))
+    return "Dataset uploaded successfully"
+# Initialize LLM Models
+model = ChatOllama(model="deepseek-r1:7b", base_url="http://localhost:11434", temperature=0.2, max_tokens=200)
+def load_documents(folder_path):
+        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
+        all_documents = []
+        os.makedirs('data', exist_ok=True)
+        for filename in os.listdir(folder_path):
+            file_path = os.path.join(folder_path, filename)
+            if filename.endswith('.pdf'):
+                loader = PyMuPDFLoader(file_path)
+            elif filename.endswith('.docx'):
+                loader = UnstructuredWordDocumentLoader(file_path)
+            else:
+                continue  # Skip unsupported files
+            documents = loader.load()
+            all_documents.extend(text_splitter.split_documents(documents))
+            print(f"Processed and indexed {filename}")
+        return all_documents
+# Vector Store Setup
+def setup_vector_store(documents):
+    embeddings = OllamaEmbeddings(model='nomic-embed-text', base_url="http://localhost:11434")
+    vectorstore = FAISS.from_documents(documents, embeddings)
+    vectorstore.save_local("deepseek_cyfuture")
+    return vectorstore
+# Create RAG Chain
+def create_rag_chain(retriever):
+    prompt_template = ChatPromptTemplate.from_template(
+        """
+You are an AI questiona answering assistant specialized in answering user queries strictly from the provided context. Give detailed answer to user question considering the context.
+STRICT RULES:
+- You *must not* answer any questions outside the provided context.
+- If the question is unrelated to billing, payments, customer, or meter reading, respond with exactly:
+  **"This question is outside my specialized domain."**
+- Do NOT attempt to generate an answer from loosely related context.
+- If the context does not contain a valid answer, simply state: **"I don't know the answer."**
+VALIDATION STEP:
+1. Check if the query is related to **billing, payments, customer, or meter reading**.
+2. If NOT, respond with: `"This question is outside my specialized domain."` and nothing else.
+3. If the context does not contain relevant data try to find best possible answer from the context.
+4. Do NOT generate speculative answers.
+5. if the generated answer don't adress the question then try to find the best possible answer from the context you can add more releavnt context to the answer.
+Question: {question}
+Context: {context}
+Answer:
+        """
+    )
+    return (
+        {"context": retriever | format_docs, "question": RunnablePassthrough()}
+        | prompt_template
+        | model
+        | StrOutputParser()
+    )
+def format_docs(docs):
+    return "\n\n".join(doc.page_content for doc in docs)
+def clean_response(response):
+    return re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL).strip()
+@track()
+def llm_chain(input_text):
+    try:
+        context = "\n".join(doc.page_content for doc in retriever.invoke(input_text))
+        response = "".join(chunk for chunk in rag_chain.stream(input_text) if isinstance(chunk, str))
+        return {"response": clean_response(response), "context_used": context}
+    except Exception as e:
+        return {"error": str(e)}
+def evaluation_task(x):
+    try:
+        result = llm_chain(x['user_question'])
+        return {"input": x['user_question'], "output": result["response"], "context": result["context_used"], "expected": x['expected_output']}
+    except Exception as e:
+        return {"input": x['user_question'], "output": "", "context": x['expected_output']}
+# experiment_name = f"Deepseek_{dataset.name}_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"
+# metrics = [Hallucination(model=model1), AnswerRelevance(model=model1)]
+@app.post("/run_evaluation/")
+@backoff.on_exception(backoff.expo, (APIConnectionError, Exception), max_tries=3, max_time=300)
+def run_evaluation():
+    experiment_name = f"Deepseek_{dataset.name}_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"
+    metrics = [Hallucination(), AnswerRelevance()]
+    try:
+        evaluate(
+            experiment_name=experiment_name,
+            dataset=dataset,
+            task=evaluation_task,
+            scoring_metrics=metrics,
+            experiment_config={"model": model},
+            task_threads=2
+        )
+        return {"message": "Evaluation completed successfully"}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+# @backoff.on_exception(backoff.expo, (APIConnectionError, Exception), max_tries=3, max_time=300)
+# def run_evaluation():
+#     return evaluate(experiment_name=experiment_name, dataset=dataset, task=evaluation_task, scoring_metrics=metrics, experiment_config={"model": model}, task_threads=2)
+# run_evaluation()
+# Create Vector Database
+def create_db():
+    source = r'AI Agent'
+    markdown_content = load_documents(source)
+    setup_vector_store(markdown_content)
+    return "Database created successfully"
+embeddings = OllamaEmbeddings(model='nomic-embed-text', base_url="http://localhost:11434")
+vectorstore = FAISS.load_local("deepseek_cyfuture", embeddings, allow_dangerous_deserialization=True)
+retriever = vectorstore.as_retriever( search_kwargs={'k': 2})
+rag_chain = create_rag_chain(retriever)
+@track()
+@app.get("/query/")
+def chain(input_text: str = Query(..., description="Enter your question")):
+    try:
+        # def generate():
+        #     for chunk in rag_chain.stream(input_text):
+        #         if isinstance(chunk, str):
+        #             yield chunk
+        def generate():
+            buffer = ""  # Temporary buffer to hold chunks until `</think>` is found
+            start_sending = False
+            for chunk in rag_chain.stream(input_text):
+                # if isinstance(chunk, str):
+                #     buffer += chunk  # Append chunk to buffer
+                #     # Check if `</think>` is found
+                #     if "</think>" in buffer:
+                #         start_sending = True
+                #         # Yield everything after `</think>` (including `</think>` itself)
+                #         yield buffer.split("</think>", 1)[1]
+                #         buffer = ""  # Clear the buffer after sending the first response
+                #     elif start_sending:
+                yield chunk  # Continue yielding after the `</think>` tag
+        return StreamingResponse(generate(), media_type="text/plain")
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/")
+def read_root():
+    return {"message": "Welcome to the AI Assistant API!"}
+if __name__ == "__main__":
+    # start my fastapi app
+    import uvicorn
+    uvicorn.run(app, host="127.0.0.1", port=7860)
+    # questions=[ "Is the website accessible through mobile also? please tell the benefits of it","How do I register for a new connection?","how to make payments?",]
+    # # Questions for retrieval
+    # # Answer questions
+    # create_db()
+    # # Load Vector Store
+    # embeddings = OllamaEmbeddings(model='nomic-embed-text', base_url="http://localhost:11434")
+    # vectorstore = FAISS.load_local("deepseek_cyfuture", embeddings, allow_dangerous_deserialization=True)
+    # retriever = vectorstore.as_retriever( search_kwargs={'k': 3})
+    # rag_chain = create_rag_chain(retriever)
+    # for question in questions:
+    #     print(f"Question: {question}")
+    #     for chunk in rag_chain.stream(question):
+    #         print(chunk, end="", flush=True)
+    #     print("\n" + "-" * 50 + "\n")

requirements.txt ADDED Viewed

Binary file (8.09 kB). View file

start.sh ADDED Viewed

	@@ -0,0 +1,39 @@

+#!/bin/bash
+# Set environment variables for optimization
+export OMP_NUM_THREADS=4
+export MKL_NUM_THREADS=4
+export CUDA_VISIBLE_DEVICES=0,1
+# Start Ollama in the background
+ollama serve &
+# Pull the model if not already present
+if ! ollama list | grep -q "deepseek-r1:7b"; then
+    ollama pull deepseek-r1:7b
+fi
+if ! ollama list | grep -q "nomic-embed-text"; then
+    ollama pull nomic-embed-text
+fi
+# Wait for Ollama to start up
+max_attempts=30
+attempt=0
+while ! curl -s http://localhost:11434/api/tags >/dev/null; do
+    sleep 1
+    attempt=$((attempt + 1))
+    if [ $attempt -eq $max_attempts ]; then
+        echo "Ollama failed to start within 30 seconds. Exiting."
+        exit 1
+    fi
+done
+echo "Ollama is ready."
+# Print the API URL
+echo "API is running on: http://0.0.0.0:7860"
+# Start FastAPI in the background
+uvicorn app:app --host 0.0.0.0 --port 7860 --workers 4 --limit-concurrency 20 &
+# Start Streamlit for UI
+streamlit run streamlit_app.py --server.port 8501 --server.address 0.0.0.0

streamlit_app.py ADDED Viewed

	@@ -0,0 +1,76 @@

+import streamlit as st
+import requests
+import re  # For space cleanup
+st.set_page_config(page_title="AI Chatbot", layout="centered")
+st.title("🤖 AI Chatbot")
+if "messages" not in st.session_state:
+    st.session_state.messages = []
+# Function to query AI API and stream response
+def query_ai(question):
+    url = "http://127.0.0.1:7860/query/"
+    params = {"input_text": question}
+    with requests.get(url, params=params, stream=True) as response:
+        if response.status_code == 200:
+            full_response = ""
+            for chunk in response.iter_content(chunk_size=1024):
+                if chunk:
+                    text_chunk = chunk.decode("utf-8")
+                    full_response += text_chunk
+                    yield full_response  # Streamed response
+# Custom CSS for spacing fix
+st.markdown("""
+    <style>
+        .chat-box {
+            background-color: #1e1e1e;
+            padding: 12px;
+            border-radius: 10px;
+            margin-top: 5px;
+            font-size: 154x;
+            font-family: monospace;
+            white-space: pre-wrap;
+            word-wrap: break-word;
+            line-height: 1.2;
+            color: #ffffff;
+        }
+    </style>
+""", unsafe_allow_html=True)
+user_input = st.text_input("Ask a question:", "", key="user_input")
+submit_button = st.button("Submit")
+if submit_button and user_input:
+    st.session_state.messages.append({"role": "user", "content": user_input})
+    # Placeholder for streaming
+    response_container = st.empty()
+    full_response = ""
+    with st.spinner("🤖 AI is thinking..."):
+        for chunk in query_ai(user_input):
+            full_response = chunk
+            response_container.markdown(f'<div class="chat-box">{full_response}</div>', unsafe_allow_html=True)
+    response_container.empty()  # Hides the streamed "Thinking" response after completion
+    # Extract refined answer after "</think>"
+    if "</think>" in full_response:
+        refined_response = full_response.split("</think>", 1)[-1].strip()
+    else:
+        refined_response = full_response  # Fallback if </think> is missing
+    # Remove extra newlines and excessive spaces
+    refined_response = re.sub(r'\n\s*\n', '\n', refined_response.strip())
+    # Expandable AI Thought Process Box
+    with st.expander("🤖 AI's Thought Process (Click to Expand)"):
+        st.markdown(f'<div class="chat-box">{full_response}</div>', unsafe_allow_html=True)
+    # Display refined answer with clean formatting
+    st.write("Answer:")
+    st.markdown(refined_response, unsafe_allow_html=True)