Spaces:
Running
Running
Commit
Β·
d32bdc1
1
Parent(s):
46e25d2
docs: update MiniDoc and README.md for improved clarity and feature descriptions
Browse files- Docs/MiniDoc.md +52 -25
- README.md +69 -48
- README_hf.md +69 -48
- src/crawlgpt/core/database.py +84 -2
Docs/MiniDoc.md
CHANGED
@@ -10,32 +10,41 @@ CrawlGPT is a web content crawler with GPT-powered summarization and chat capabi
|
|
10 |
crawlgpt/
|
11 |
βββ src/
|
12 |
β βββ crawlgpt/
|
13 |
-
β βββ core/
|
14 |
-
β β βββ
|
15 |
-
β β βββ LLMBasedCrawler.py
|
16 |
-
β β
|
17 |
-
β
|
18 |
-
β
|
19 |
-
β β
|
20 |
-
β
|
21 |
-
β
|
22 |
-
β
|
23 |
-
β βββ
|
24 |
-
β βββ
|
25 |
-
β
|
26 |
-
βββ
|
|
|
|
|
27 |
β βββ test_core/
|
28 |
-
β βββ test_database_handler.py
|
29 |
-
β βββ test_integration.py
|
30 |
-
β βββ test_llm_based_crawler.py
|
31 |
-
β βββ test_summary_generator.py
|
32 |
-
βββ .
|
33 |
-
|
34 |
-
|
35 |
-
βββ Docs
|
36 |
-
|
37 |
-
βββ
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
```
|
40 |
|
41 |
## Core Components
|
@@ -59,6 +68,24 @@ crawlgpt/
|
|
59 |
- Configurable model selection and parameters
|
60 |
- Handles empty input validation
|
61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
## UI Components
|
63 |
|
64 |
### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
|
|
|
10 |
crawlgpt/
|
11 |
βββ src/
|
12 |
β βββ crawlgpt/
|
13 |
+
β βββ core/ # Core functionality
|
14 |
+
β β βββ database.py # SQL database handling
|
15 |
+
β β βββ LLMBasedCrawler.py # Main crawler implementation
|
16 |
+
β β βββ DatabaseHandler.py # Vector database (FAISS)
|
17 |
+
β β βββ SummaryGenerator.py # Text summarization
|
18 |
+
β βββ ui/ # User Interface
|
19 |
+
β β βββ chat_app.py # Main Streamlit app
|
20 |
+
β β βββ chat_ui.py # Development UI
|
21 |
+
β β βββ login.py # Authentication UI
|
22 |
+
β βββ utils/ # Utilities
|
23 |
+
β βββ content_validator.py # URL/content validation
|
24 |
+
β βββ data_manager.py # Import/export handling
|
25 |
+
β βββ helper_functions.py # General helpers
|
26 |
+
β βββ monitoring.py # Metrics collection
|
27 |
+
β βββ progress.py # Progress tracking
|
28 |
+
βββ tests/ # Test suite
|
29 |
β βββ test_core/
|
30 |
+
β βββ test_database_handler.py # Vector DB tests
|
31 |
+
β βββ test_integration.py # Integration tests
|
32 |
+
β βββ test_llm_based_crawler.py # Crawler tests
|
33 |
+
β βββ test_summary_generator.py # Summarizer tests
|
34 |
+
βββ .github/ # CI/CD
|
35 |
+
β βββ workflows/
|
36 |
+
β βββ Push_to_hf.yaml # HuggingFace sync
|
37 |
+
βββ Docs/
|
38 |
+
β βββ MiniDoc.md # Documentation
|
39 |
+
βββ .dockerignore # Docker exclusions
|
40 |
+
βββ .gitignore # Git exclusions
|
41 |
+
βββ Dockerfile # Container config
|
42 |
+
βββ LICENSE # MIT License
|
43 |
+
βββ README.md # Project documentation
|
44 |
+
βββ README_hf.md # HuggingFace README
|
45 |
+
βββ pyproject.toml # Project metadata
|
46 |
+
βββ pytest.ini # Test configuration
|
47 |
+
βββ setup_env.py # Environment setup
|
48 |
```
|
49 |
|
50 |
## Core Components
|
|
|
68 |
- Configurable model selection and parameters
|
69 |
- Handles empty input validation
|
70 |
|
71 |
+
### [Database](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/database.py) (src/crawl/core/database.py)
|
72 |
+
|
73 |
+
- SQLAlchemy-based database handling for user management and chat history
|
74 |
+
- Provides secure user authentication with BCrypt password hashing
|
75 |
+
- Manages persistent storage of chat conversations and context
|
76 |
+
|
77 |
+
- Configuration
|
78 |
+
- Uses SQLite by default (`sqlite:///crawlgpt.db`)
|
79 |
+
- Configurable via DATABASE_URL environment variable
|
80 |
+
- Automatic schema creation on startup
|
81 |
+
- Session management with SQLAlchemy sessionmaker
|
82 |
+
- Security Features
|
83 |
+
- BCrypt password hashing with PassLib
|
84 |
+
- Unique username enforcement
|
85 |
+
- Secure session handling
|
86 |
+
- Role-based message tracking
|
87 |
+
|
88 |
+
|
89 |
## UI Components
|
90 |
|
91 |
### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
|
README.md
CHANGED
@@ -1,27 +1,42 @@
|
|
1 |
-
#
|
2 |
|
3 |
-
A powerful web content crawler with LLM-powered
|
4 |
|
5 |
-
## π Features
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
-
|
12 |
-
|
13 |
-
|
14 |
-
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
-
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## π₯ Demo
|
27 |
### [Deployed APP ππ€](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
|
@@ -109,35 +124,41 @@ _Example of CRAWLGPT in action!_
|
|
109 |
crawlgpt/
|
110 |
βββ src/
|
111 |
β βββ crawlgpt/
|
112 |
-
β βββ core/
|
113 |
-
β β βββ
|
114 |
-
β β βββ LLMBasedCrawler.py
|
115 |
-
β β
|
116 |
-
β
|
117 |
-
β
|
118 |
-
β β
|
119 |
-
β
|
120 |
-
β
|
121 |
-
β
|
122 |
-
β βββ
|
123 |
-
β βββ
|
124 |
-
β
|
125 |
-
βββ
|
|
|
|
|
126 |
β βββ test_core/
|
127 |
-
β βββ test_database_handler.py
|
128 |
-
β βββ test_integration.py
|
129 |
-
β βββ test_llm_based_crawler.py
|
130 |
-
β βββ test_summary_generator.py
|
131 |
-
βββ .github/
|
132 |
β βββ workflows/
|
133 |
-
β βββ Push_to_hf.yaml
|
134 |
-
βββ
|
135 |
-
|
136 |
-
βββ
|
137 |
-
βββ
|
138 |
-
βββ
|
139 |
-
βββ
|
140 |
-
|
|
|
|
|
|
|
|
|
141 |
```
|
142 |
|
143 |
## π§ͺ Testing
|
|
|
1 |
+
# CrawlGPT π€
|
2 |
|
3 |
+
A powerful web content crawler with LLM-powered RAG (Retrieval Augmented Generation) capabilities. CrawlGPT extracts content from URLs, processes it through intelligent summarization, and enables natural language interactions using modern LLM technology.
|
4 |
|
5 |
+
## π Key Features
|
6 |
|
7 |
+
### Core Features
|
8 |
+
- **Intelligent Web Crawling**
|
9 |
+
- Async web content extraction using Playwright
|
10 |
+
- Smart rate limiting and validation
|
11 |
+
- Configurable crawling strategies
|
12 |
+
|
13 |
+
- **Advanced Content Processing**
|
14 |
+
- Automatic text chunking and summarization
|
15 |
+
- Vector embeddings via FAISS
|
16 |
+
- Context-aware response generation
|
17 |
+
|
18 |
+
- **Streamlit Chat Interface**
|
19 |
+
- Clean, responsive UI
|
20 |
+
- Real-time content processing
|
21 |
+
- Conversation history
|
22 |
+
- User authentication
|
23 |
+
|
24 |
+
### Technical Features
|
25 |
+
- **Vector Database**
|
26 |
+
- FAISS-powered similarity search
|
27 |
+
- Efficient content retrieval
|
28 |
+
- Persistent storage
|
29 |
+
|
30 |
+
- **User Management**
|
31 |
+
- SQLite database backend
|
32 |
+
- Secure password hashing
|
33 |
+
- Chat history tracking
|
34 |
+
|
35 |
+
- **Monitoring & Utils**
|
36 |
+
- Request metrics collection
|
37 |
+
- Progress tracking
|
38 |
+
- Data import/export
|
39 |
+
- Content validation
|
40 |
|
41 |
## π₯ Demo
|
42 |
### [Deployed APP ππ€](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
|
|
|
124 |
crawlgpt/
|
125 |
βββ src/
|
126 |
β βββ crawlgpt/
|
127 |
+
β βββ core/ # Core functionality
|
128 |
+
β β βββ database.py # SQL database handling
|
129 |
+
β β βββ LLMBasedCrawler.py # Main crawler implementation
|
130 |
+
β β βββ DatabaseHandler.py # Vector database (FAISS)
|
131 |
+
β β βββ SummaryGenerator.py # Text summarization
|
132 |
+
β βββ ui/ # User Interface
|
133 |
+
β β βββ chat_app.py # Main Streamlit app
|
134 |
+
β β βββ chat_ui.py # Development UI
|
135 |
+
β β βββ login.py # Authentication UI
|
136 |
+
β βββ utils/ # Utilities
|
137 |
+
β βββ content_validator.py # URL/content validation
|
138 |
+
β βββ data_manager.py # Import/export handling
|
139 |
+
β βββ helper_functions.py # General helpers
|
140 |
+
β βββ monitoring.py # Metrics collection
|
141 |
+
β βββ progress.py # Progress tracking
|
142 |
+
βββ tests/ # Test suite
|
143 |
β βββ test_core/
|
144 |
+
β βββ test_database_handler.py # Vector DB tests
|
145 |
+
β βββ test_integration.py # Integration tests
|
146 |
+
β βββ test_llm_based_crawler.py # Crawler tests
|
147 |
+
β βββ test_summary_generator.py # Summarizer tests
|
148 |
+
βββ .github/ # CI/CD
|
149 |
β βββ workflows/
|
150 |
+
β βββ Push_to_hf.yaml # HuggingFace sync
|
151 |
+
βββ Docs/
|
152 |
+
β βββ MiniDoc.md # Documentation
|
153 |
+
βββ .dockerignore # Docker exclusions
|
154 |
+
βββ .gitignore # Git exclusions
|
155 |
+
βββ Dockerfile # Container config
|
156 |
+
βββ LICENSE # MIT License
|
157 |
+
βββ README.md # Project documentation
|
158 |
+
βββ README_hf.md # HuggingFace README
|
159 |
+
βββ pyproject.toml # Project metadata
|
160 |
+
βββ pytest.ini # Test configuration
|
161 |
+
βββ setup_env.py # Environment setup
|
162 |
```
|
163 |
|
164 |
## π§ͺ Testing
|
README_hf.md
CHANGED
@@ -8,30 +8,45 @@ colorTo: blue
|
|
8 |
pinned: true
|
9 |
short_description: A powerful web content crawler with LLM-powered RAG.
|
10 |
---
|
11 |
-
#
|
12 |
|
13 |
-
A powerful web content crawler with LLM-powered
|
14 |
|
15 |
-
## π Features
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
-
|
22 |
-
|
23 |
-
|
24 |
-
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
-
|
33 |
-
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## π₯ Demo
|
37 |
### [Deployed APP ππ€](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
|
@@ -119,35 +134,41 @@ _Example of CRAWLGPT in action!_
|
|
119 |
crawlgpt/
|
120 |
βββ src/
|
121 |
β βββ crawlgpt/
|
122 |
-
β βββ core/
|
123 |
-
β β βββ
|
124 |
-
β β βββ LLMBasedCrawler.py
|
125 |
-
β β
|
126 |
-
β
|
127 |
-
β
|
128 |
-
β β
|
129 |
-
β
|
130 |
-
β
|
131 |
-
β
|
132 |
-
β βββ
|
133 |
-
β βββ
|
134 |
-
β
|
135 |
-
βββ
|
|
|
|
|
136 |
β βββ test_core/
|
137 |
-
β βββ test_database_handler.py
|
138 |
-
β βββ test_integration.py
|
139 |
-
β βββ test_llm_based_crawler.py
|
140 |
-
β βββ test_summary_generator.py
|
141 |
-
βββ .github/
|
142 |
β βββ workflows/
|
143 |
-
β βββ Push_to_hf.yaml
|
144 |
-
βββ
|
145 |
-
|
146 |
-
βββ
|
147 |
-
|
148 |
-
βββ
|
149 |
-
βββ
|
150 |
-
|
|
|
|
|
|
|
|
|
151 |
```
|
152 |
|
153 |
## π§ͺ Testing
|
|
|
8 |
pinned: true
|
9 |
short_description: A powerful web content crawler with LLM-powered RAG.
|
10 |
---
|
11 |
+
# CrawlGPT π€
|
12 |
|
13 |
+
A powerful web content crawler with LLM-powered RAG (Retrieval Augmented Generation) capabilities. CrawlGPT extracts content from URLs, processes it through intelligent summarization, and enables natural language interactions using modern LLM technology.
|
14 |
|
15 |
+
## π Key Features
|
16 |
|
17 |
+
### Core Features
|
18 |
+
- **Intelligent Web Crawling**
|
19 |
+
- Async web content extraction using Playwright
|
20 |
+
- Smart rate limiting and validation
|
21 |
+
- Configurable crawling strategies
|
22 |
+
|
23 |
+
- **Advanced Content Processing**
|
24 |
+
- Automatic text chunking and summarization
|
25 |
+
- Vector embeddings via FAISS
|
26 |
+
- Context-aware response generation
|
27 |
+
|
28 |
+
- **Streamlit Chat Interface**
|
29 |
+
- Clean, responsive UI
|
30 |
+
- Real-time content processing
|
31 |
+
- Conversation history
|
32 |
+
- User authentication
|
33 |
+
|
34 |
+
### Technical Features
|
35 |
+
- **Vector Database**
|
36 |
+
- FAISS-powered similarity search
|
37 |
+
- Efficient content retrieval
|
38 |
+
- Persistent storage
|
39 |
+
|
40 |
+
- **User Management**
|
41 |
+
- SQLite database backend
|
42 |
+
- Secure password hashing
|
43 |
+
- Chat history tracking
|
44 |
+
|
45 |
+
- **Monitoring & Utils**
|
46 |
+
- Request metrics collection
|
47 |
+
- Progress tracking
|
48 |
+
- Data import/export
|
49 |
+
- Content validation
|
50 |
|
51 |
## π₯ Demo
|
52 |
### [Deployed APP ππ€](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
|
|
|
134 |
crawlgpt/
|
135 |
βββ src/
|
136 |
β βββ crawlgpt/
|
137 |
+
β βββ core/ # Core functionality
|
138 |
+
β β βββ database.py # SQL database handling
|
139 |
+
β β βββ LLMBasedCrawler.py # Main crawler implementation
|
140 |
+
β β βββ DatabaseHandler.py # Vector database (FAISS)
|
141 |
+
β β βββ SummaryGenerator.py # Text summarization
|
142 |
+
β βββ ui/ # User Interface
|
143 |
+
β β βββ chat_app.py # Main Streamlit app
|
144 |
+
β β βββ chat_ui.py # Development UI
|
145 |
+
β β βββ login.py # Authentication UI
|
146 |
+
β βββ utils/ # Utilities
|
147 |
+
β βββ content_validator.py # URL/content validation
|
148 |
+
β βββ data_manager.py # Import/export handling
|
149 |
+
β βββ helper_functions.py # General helpers
|
150 |
+
β βββ monitoring.py # Metrics collection
|
151 |
+
β βββ progress.py # Progress tracking
|
152 |
+
βββ tests/ # Test suite
|
153 |
β βββ test_core/
|
154 |
+
β βββ test_database_handler.py # Vector DB tests
|
155 |
+
β βββ test_integration.py # Integration tests
|
156 |
+
β βββ test_llm_based_crawler.py # Crawler tests
|
157 |
+
β βββ test_summary_generator.py # Summarizer tests
|
158 |
+
βββ .github/ # CI/CD
|
159 |
β βββ workflows/
|
160 |
+
β βββ Push_to_hf.yaml # HuggingFace sync
|
161 |
+
βββ Docs/
|
162 |
+
β βββ MiniDoc.md # Documentation
|
163 |
+
βββ .dockerignore # Docker exclusions
|
164 |
+
βββ .gitignore # Git exclusions
|
165 |
+
βββ Dockerfile # Container config
|
166 |
+
βββ LICENSE # MIT License
|
167 |
+
βββ README.md # Project documentation
|
168 |
+
βββ README_hf.md # HuggingFace README
|
169 |
+
βββ pyproject.toml # Project metadata
|
170 |
+
βββ pytest.ini # Test configuration
|
171 |
+
βββ setup_env.py # Environment setup
|
172 |
```
|
173 |
|
174 |
## π§ͺ Testing
|
src/crawlgpt/core/database.py
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
-
#
|
|
|
|
|
2 |
from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, ForeignKey
|
3 |
from sqlalchemy.ext.declarative import declarative_base
|
4 |
from sqlalchemy.orm import sessionmaker, relationship
|
@@ -6,10 +8,23 @@ from datetime import datetime
|
|
6 |
from passlib.context import CryptContext
|
7 |
import os
|
8 |
|
|
|
9 |
Base = declarative_base()
|
|
|
|
|
10 |
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
|
11 |
|
12 |
class User(Base):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
__tablename__ = 'users'
|
14 |
id = Column(Integer, primary_key=True)
|
15 |
username = Column(String(50), unique=True)
|
@@ -19,6 +34,17 @@ class User(Base):
|
|
19 |
chats = relationship("ChatHistory", back_populates="user")
|
20 |
|
21 |
class ChatHistory(Base):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
__tablename__ = 'chat_history'
|
23 |
id = Column(Integer, primary_key=True)
|
24 |
user_id = Column(Integer, ForeignKey('users.id'))
|
@@ -28,12 +54,22 @@ class ChatHistory(Base):
|
|
28 |
timestamp = Column(DateTime, default=datetime.utcnow)
|
29 |
user = relationship("User", back_populates="chats")
|
30 |
|
|
|
31 |
engine = create_engine(os.getenv('DATABASE_URL', 'sqlite:///crawlgpt.db'))
|
32 |
Base.metadata.create_all(bind=engine)
|
33 |
Session = sessionmaker(bind=engine)
|
34 |
|
35 |
# Database operations
|
36 |
def create_user(username: str, password: str, email: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
with Session() as session:
|
38 |
if session.query(User).filter(User.username == username).first():
|
39 |
return False
|
@@ -44,6 +80,14 @@ def create_user(username: str, password: str, email: str):
|
|
44 |
return True
|
45 |
|
46 |
def authenticate_user(username: str, password: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
with Session() as session:
|
48 |
user = session.query(User).filter(User.username == username).first()
|
49 |
if user and pwd_context.verify(password, user.password_hash):
|
@@ -51,6 +95,17 @@ def authenticate_user(username: str, password: str):
|
|
51 |
return None
|
52 |
|
53 |
def save_chat_message(user_id: int, message: str, role: str, context: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
with Session() as session:
|
55 |
chat = ChatHistory(
|
56 |
user_id=user_id,
|
@@ -62,12 +117,27 @@ def save_chat_message(user_id: int, message: str, role: str, context: str):
|
|
62 |
session.commit()
|
63 |
|
64 |
def get_chat_history(user_id: int):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
with Session() as session:
|
66 |
return session.query(ChatHistory).filter(
|
67 |
ChatHistory.user_id == user_id
|
68 |
).order_by(ChatHistory.timestamp).all()
|
69 |
|
70 |
def delete_user_chat_history(user_id: int):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
with Session() as session:
|
72 |
session.query(ChatHistory).filter(
|
73 |
ChatHistory.user_id == user_id
|
@@ -75,7 +145,19 @@ def delete_user_chat_history(user_id: int):
|
|
75 |
session.commit()
|
76 |
|
77 |
def restore_chat_history(user_id: int):
|
78 |
-
"""Restores chat history from database to session state
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
with Session() as session:
|
80 |
history = session.query(ChatHistory).filter(
|
81 |
ChatHistory.user_id == user_id
|
|
|
1 |
+
# This module provides SQLAlchemy models and database utilities for user management
|
2 |
+
# and chat history persistence.
|
3 |
+
|
4 |
from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, ForeignKey
|
5 |
from sqlalchemy.ext.declarative import declarative_base
|
6 |
from sqlalchemy.orm import sessionmaker, relationship
|
|
|
8 |
from passlib.context import CryptContext
|
9 |
import os
|
10 |
|
11 |
+
# SQLAlchemy models
|
12 |
Base = declarative_base()
|
13 |
+
|
14 |
+
# Password hashing
|
15 |
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
|
16 |
|
17 |
class User(Base):
|
18 |
+
"""User model for authentication and chat history tracking.
|
19 |
+
|
20 |
+
Attributes:
|
21 |
+
id (int): Primary key
|
22 |
+
username (str): Unique username, max 50 chars
|
23 |
+
password_hash (str): BCrypt hashed password, 60 chars
|
24 |
+
email (str): User email, max 100 chars
|
25 |
+
created_at (datetime): Account creation timestamp
|
26 |
+
chats (relationship): One-to-many relationship to ChatHistory
|
27 |
+
"""
|
28 |
__tablename__ = 'users'
|
29 |
id = Column(Integer, primary_key=True)
|
30 |
username = Column(String(50), unique=True)
|
|
|
34 |
chats = relationship("ChatHistory", back_populates="user")
|
35 |
|
36 |
class ChatHistory(Base):
|
37 |
+
"""ChatHistory model for storing chat messages.
|
38 |
+
|
39 |
+
Attributes:
|
40 |
+
id (int): Primary key
|
41 |
+
user_id (int): Foreign key to User
|
42 |
+
message (str): Chat message content
|
43 |
+
role (str): Role of the message sender ('user' or 'assistant')
|
44 |
+
context (str): Context of the chat message
|
45 |
+
timestamp (datetime): Timestamp of the message
|
46 |
+
user (relationship): Many-to-one relationship to User
|
47 |
+
"""
|
48 |
__tablename__ = 'chat_history'
|
49 |
id = Column(Integer, primary_key=True)
|
50 |
user_id = Column(Integer, ForeignKey('users.id'))
|
|
|
54 |
timestamp = Column(DateTime, default=datetime.utcnow)
|
55 |
user = relationship("User", back_populates="chats")
|
56 |
|
57 |
+
# Database initialization
|
58 |
engine = create_engine(os.getenv('DATABASE_URL', 'sqlite:///crawlgpt.db'))
|
59 |
Base.metadata.create_all(bind=engine)
|
60 |
Session = sessionmaker(bind=engine)
|
61 |
|
62 |
# Database operations
|
63 |
def create_user(username: str, password: str, email: str):
|
64 |
+
"""
|
65 |
+
Creates a new user in the database
|
66 |
+
Args:
|
67 |
+
username (str): Username
|
68 |
+
password (str): Password
|
69 |
+
email (str): Email
|
70 |
+
Returns:
|
71 |
+
bool: True if user is created, False if username is taken
|
72 |
+
"""
|
73 |
with Session() as session:
|
74 |
if session.query(User).filter(User.username == username).first():
|
75 |
return False
|
|
|
80 |
return True
|
81 |
|
82 |
def authenticate_user(username: str, password: str):
|
83 |
+
"""
|
84 |
+
Authenticates a user with a username and password
|
85 |
+
Args:
|
86 |
+
username (str): Username
|
87 |
+
password (str): Password
|
88 |
+
Returns:
|
89 |
+
User: User object if authentication is successful, None otherwise
|
90 |
+
"""
|
91 |
with Session() as session:
|
92 |
user = session.query(User).filter(User.username == username).first()
|
93 |
if user and pwd_context.verify(password, user.password_hash):
|
|
|
95 |
return None
|
96 |
|
97 |
def save_chat_message(user_id: int, message: str, role: str, context: str):
|
98 |
+
"""Saves a chat message to the database
|
99 |
+
|
100 |
+
Args:
|
101 |
+
user_id (int): User ID
|
102 |
+
message (str): Chat message content
|
103 |
+
role (str): Role of the message sender ('user' or 'assistant')
|
104 |
+
context (str): Context of the chat message
|
105 |
+
|
106 |
+
Returns:
|
107 |
+
None
|
108 |
+
"""
|
109 |
with Session() as session:
|
110 |
chat = ChatHistory(
|
111 |
user_id=user_id,
|
|
|
117 |
session.commit()
|
118 |
|
119 |
def get_chat_history(user_id: int):
|
120 |
+
"""
|
121 |
+
Retrieves chat history for a user
|
122 |
+
Args:
|
123 |
+
user_id (int): User ID
|
124 |
+
|
125 |
+
Returns:
|
126 |
+
List[ChatHistory]: List of chat messages
|
127 |
+
"""
|
128 |
with Session() as session:
|
129 |
return session.query(ChatHistory).filter(
|
130 |
ChatHistory.user_id == user_id
|
131 |
).order_by(ChatHistory.timestamp).all()
|
132 |
|
133 |
def delete_user_chat_history(user_id: int):
|
134 |
+
"""Deletes all chat history for a user
|
135 |
+
Args:
|
136 |
+
user_id (int): User ID
|
137 |
+
|
138 |
+
Returns:
|
139 |
+
None
|
140 |
+
"""
|
141 |
with Session() as session:
|
142 |
session.query(ChatHistory).filter(
|
143 |
ChatHistory.user_id == user_id
|
|
|
145 |
session.commit()
|
146 |
|
147 |
def restore_chat_history(user_id: int):
|
148 |
+
"""Restores chat history from database to session state
|
149 |
+
Args:
|
150 |
+
user_id (int): User ID
|
151 |
+
|
152 |
+
Returns:
|
153 |
+
List[Dict]: List of chat messages in the format:
|
154 |
+
{
|
155 |
+
"role": str,
|
156 |
+
"content": str,
|
157 |
+
"context": str,
|
158 |
+
"timestamp": datetime
|
159 |
+
}
|
160 |
+
"""
|
161 |
with Session() as session:
|
162 |
history = session.query(ChatHistory).filter(
|
163 |
ChatHistory.user_id == user_id
|