jatinmehra commited on
Commit
d32bdc1
Β·
1 Parent(s): 46e25d2

docs: update MiniDoc and README.md for improved clarity and feature descriptions

Browse files
Files changed (4) hide show
  1. Docs/MiniDoc.md +52 -25
  2. README.md +69 -48
  3. README_hf.md +69 -48
  4. src/crawlgpt/core/database.py +84 -2
Docs/MiniDoc.md CHANGED
@@ -10,32 +10,41 @@ CrawlGPT is a web content crawler with GPT-powered summarization and chat capabi
10
  crawlgpt/
11
  β”œβ”€β”€ src/
12
  β”‚ └── crawlgpt/
13
- β”‚ β”œβ”€β”€ core/
14
- β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py
15
- β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py
16
- β”‚ β”‚ └── SummaryGenerator.py
17
- β”‚ β”œβ”€β”€ ui/
18
- β”‚ β”‚ β”œβ”€β”€ chat_app.py
19
- β”‚ β”‚ └── chat_ui.py
20
- β”‚ └── utils/
21
- β”‚ β”œβ”€β”€ content_validator.py
22
- β”‚ β”œβ”€β”€ data_manager.py
23
- β”‚ β”œβ”€β”€ helper_functions.py
24
- β”‚ β”œβ”€β”€ monitoring.py
25
- β”‚ └── progress.py
26
- β”œβ”€β”€ tests/
 
 
27
  β”‚ └── test_core/
28
- β”‚ β”œβ”€β”€ test_database_handler.py
29
- β”‚ β”œβ”€β”€ test_integration.py
30
- β”‚ β”œβ”€β”€ test_llm_based_crawler.py
31
- β”‚ └── test_summary_generator.py
32
- β”œβ”€β”€ .gitignore
33
- β”œβ”€β”€ LICENSE
34
- β”œβ”€β”€ README.md
35
- β”œβ”€β”€ Docs
36
- β”œβ”€β”€ pyproject.toml
37
- β”œβ”€β”€ pytest.ini
38
- └── setup_env.py
 
 
 
 
 
 
 
39
  ```
40
 
41
  ## Core Components
@@ -59,6 +68,24 @@ crawlgpt/
59
  - Configurable model selection and parameters
60
  - Handles empty input validation
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ## UI Components
63
 
64
  ### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
 
10
  crawlgpt/
11
  β”œβ”€β”€ src/
12
  β”‚ └── crawlgpt/
13
+ β”‚ β”œβ”€β”€ core/ # Core functionality
14
+ β”‚ β”‚ β”œβ”€β”€ database.py # SQL database handling
15
+ β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py # Main crawler implementation
16
+ β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py # Vector database (FAISS)
17
+ β”‚ β”‚ └── SummaryGenerator.py # Text summarization
18
+ β”‚ β”œβ”€β”€ ui/ # User Interface
19
+ β”‚ β”‚ β”œβ”€β”€ chat_app.py # Main Streamlit app
20
+ β”‚ β”‚ β”œβ”€β”€ chat_ui.py # Development UI
21
+ β”‚ β”‚ └── login.py # Authentication UI
22
+ β”‚ └── utils/ # Utilities
23
+ β”‚ β”œβ”€β”€ content_validator.py # URL/content validation
24
+ β”‚ β”œβ”€β”€ data_manager.py # Import/export handling
25
+ β”‚ β”œβ”€β”€ helper_functions.py # General helpers
26
+ β”‚ β”œβ”€β”€ monitoring.py # Metrics collection
27
+ β”‚ └── progress.py # Progress tracking
28
+ β”œβ”€β”€ tests/ # Test suite
29
  β”‚ └── test_core/
30
+ β”‚ β”œβ”€β”€ test_database_handler.py # Vector DB tests
31
+ β”‚ β”œβ”€β”€ test_integration.py # Integration tests
32
+ β”‚ β”œβ”€β”€ test_llm_based_crawler.py # Crawler tests
33
+ β”‚ └── test_summary_generator.py # Summarizer tests
34
+ β”œβ”€β”€ .github/ # CI/CD
35
+ β”‚ └── workflows/
36
+ β”‚ └── Push_to_hf.yaml # HuggingFace sync
37
+ β”œβ”€β”€ Docs/
38
+ β”‚ └── MiniDoc.md # Documentation
39
+ β”œβ”€β”€ .dockerignore # Docker exclusions
40
+ β”œβ”€β”€ .gitignore # Git exclusions
41
+ β”œβ”€β”€ Dockerfile # Container config
42
+ β”œβ”€β”€ LICENSE # MIT License
43
+ β”œβ”€β”€ README.md # Project documentation
44
+ β”œβ”€β”€ README_hf.md # HuggingFace README
45
+ β”œβ”€β”€ pyproject.toml # Project metadata
46
+ β”œβ”€β”€ pytest.ini # Test configuration
47
+ └── setup_env.py # Environment setup
48
  ```
49
 
50
  ## Core Components
 
68
  - Configurable model selection and parameters
69
  - Handles empty input validation
70
 
71
+ ### [Database](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/database.py) (src/crawl/core/database.py)
72
+
73
+ - SQLAlchemy-based database handling for user management and chat history
74
+ - Provides secure user authentication with BCrypt password hashing
75
+ - Manages persistent storage of chat conversations and context
76
+
77
+ - Configuration
78
+ - Uses SQLite by default (`sqlite:///crawlgpt.db`)
79
+ - Configurable via DATABASE_URL environment variable
80
+ - Automatic schema creation on startup
81
+ - Session management with SQLAlchemy sessionmaker
82
+ - Security Features
83
+ - BCrypt password hashing with PassLib
84
+ - Unique username enforcement
85
+ - Secure session handling
86
+ - Role-based message tracking
87
+
88
+
89
  ## UI Components
90
 
91
  ### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
README.md CHANGED
@@ -1,27 +1,42 @@
1
- # CRAWLGPT πŸ€–
2
 
3
- A powerful web content crawler with LLM-powered summarization and chat capabilities. CRAWLGPT extracts content from URLs, stores it in a vector database (FAISS), and enables natural language querying of the stored content. It combines modern web crawling technology with advanced language models to help you extract, analyze, and interact with web content intelligently.
4
 
5
- ## 🌟 Features
6
 
7
- - **Web Crawling**
8
- Async-based crawling powered by [crawl4ai](https://pypi.org/project/crawl4ai/) and Playwright.
9
- Includes configurable rate limiting and content validation.
10
-
11
- - **Content Processing**
12
- Automatically chunks large texts, generates embeddings, and summarizes text via the Groq API.
13
-
14
- - **Chat Interface**
15
- Streamlit-based UI with a user-friendly chat panel.
16
- Supports summarized or full-text retrieval (RAG) for context injection.
17
-
18
- - **Data Management**
19
- Stores content in a local or in-memory vector database (FAISS) for efficient retrieval.
20
- Tracks usage metrics and supports import/export of system state.
21
-
22
- - **Testing**
23
- Comprehensive unit and integration tests using Python’s `unittest` framework.
24
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## πŸŽ₯ Demo
27
  ### [Deployed APP πŸš€πŸ€–](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
@@ -109,35 +124,41 @@ _Example of CRAWLGPT in action!_
109
  crawlgpt/
110
  β”œβ”€β”€ src/
111
  β”‚ └── crawlgpt/
112
- β”‚ β”œβ”€β”€ core/
113
- β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py
114
- β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py
115
- β”‚ β”‚ └── SummaryGenerator.py
116
- β”‚ β”œβ”€β”€ ui/
117
- β”‚ β”‚ β”œβ”€β”€ chat_app.py
118
- β”‚ β”‚ └── chat_ui.py
119
- β”‚ └── utils/
120
- β”‚ β”œβ”€β”€ content_validator.py
121
- β”‚ β”œβ”€β”€ data_manager.py
122
- β”‚ β”œβ”€β”€ helper_functions.py
123
- β”‚ β”œβ”€β”€ monitoring.py
124
- β”‚ └── progress.py
125
- β”œβ”€β”€ tests/
 
 
126
  β”‚ └── test_core/
127
- β”‚ β”œβ”€β”€ test_database_handler.py
128
- β”‚ β”œβ”€β”€ test_integration.py
129
- β”‚ β”œβ”€β”€ test_llm_based_crawler.py
130
- β”‚ └── test_summary_generator.py
131
- β”œβ”€β”€ .github/
132
  β”‚ └── workflows/
133
- β”‚ └── Push_to_hf.yaml
134
- β”œβ”€β”€ .gitignore
135
- β”œβ”€β”€ LICENSE
136
- β”œβ”€β”€ README.md
137
- β”œβ”€β”€ Docs
138
- β”œβ”€β”€ pyproject.toml
139
- β”œβ”€β”€ pytest.ini
140
- └── setup_env.py
 
 
 
 
141
  ```
142
 
143
  ## πŸ§ͺ Testing
 
1
+ # CrawlGPT πŸ€–
2
 
3
+ A powerful web content crawler with LLM-powered RAG (Retrieval Augmented Generation) capabilities. CrawlGPT extracts content from URLs, processes it through intelligent summarization, and enables natural language interactions using modern LLM technology.
4
 
5
+ ## 🌟 Key Features
6
 
7
+ ### Core Features
8
+ - **Intelligent Web Crawling**
9
+ - Async web content extraction using Playwright
10
+ - Smart rate limiting and validation
11
+ - Configurable crawling strategies
12
+
13
+ - **Advanced Content Processing**
14
+ - Automatic text chunking and summarization
15
+ - Vector embeddings via FAISS
16
+ - Context-aware response generation
17
+
18
+ - **Streamlit Chat Interface**
19
+ - Clean, responsive UI
20
+ - Real-time content processing
21
+ - Conversation history
22
+ - User authentication
23
+
24
+ ### Technical Features
25
+ - **Vector Database**
26
+ - FAISS-powered similarity search
27
+ - Efficient content retrieval
28
+ - Persistent storage
29
+
30
+ - **User Management**
31
+ - SQLite database backend
32
+ - Secure password hashing
33
+ - Chat history tracking
34
+
35
+ - **Monitoring & Utils**
36
+ - Request metrics collection
37
+ - Progress tracking
38
+ - Data import/export
39
+ - Content validation
40
 
41
  ## πŸŽ₯ Demo
42
  ### [Deployed APP πŸš€πŸ€–](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
 
124
  crawlgpt/
125
  β”œβ”€β”€ src/
126
  β”‚ └── crawlgpt/
127
+ β”‚ β”œβ”€β”€ core/ # Core functionality
128
+ β”‚ β”‚ β”œβ”€β”€ database.py # SQL database handling
129
+ β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py # Main crawler implementation
130
+ β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py # Vector database (FAISS)
131
+ β”‚ β”‚ └── SummaryGenerator.py # Text summarization
132
+ β”‚ β”œβ”€β”€ ui/ # User Interface
133
+ β”‚ β”‚ β”œβ”€β”€ chat_app.py # Main Streamlit app
134
+ β”‚ β”‚ β”œβ”€β”€ chat_ui.py # Development UI
135
+ β”‚ β”‚ └── login.py # Authentication UI
136
+ β”‚ └── utils/ # Utilities
137
+ β”‚ β”œβ”€β”€ content_validator.py # URL/content validation
138
+ β”‚ β”œβ”€β”€ data_manager.py # Import/export handling
139
+ β”‚ β”œβ”€β”€ helper_functions.py # General helpers
140
+ β”‚ β”œβ”€β”€ monitoring.py # Metrics collection
141
+ β”‚ └── progress.py # Progress tracking
142
+ β”œβ”€β”€ tests/ # Test suite
143
  β”‚ └── test_core/
144
+ β”‚ β”œβ”€β”€ test_database_handler.py # Vector DB tests
145
+ β”‚ β”œβ”€β”€ test_integration.py # Integration tests
146
+ β”‚ β”œβ”€β”€ test_llm_based_crawler.py # Crawler tests
147
+ β”‚ └── test_summary_generator.py # Summarizer tests
148
+ β”œβ”€β”€ .github/ # CI/CD
149
  β”‚ └── workflows/
150
+ β”‚ └── Push_to_hf.yaml # HuggingFace sync
151
+ β”œβ”€β”€ Docs/
152
+ β”‚ └── MiniDoc.md # Documentation
153
+ β”œβ”€β”€ .dockerignore # Docker exclusions
154
+ β”œβ”€β”€ .gitignore # Git exclusions
155
+ β”œβ”€β”€ Dockerfile # Container config
156
+ β”œβ”€β”€ LICENSE # MIT License
157
+ β”œβ”€β”€ README.md # Project documentation
158
+ β”œβ”€β”€ README_hf.md # HuggingFace README
159
+ β”œβ”€β”€ pyproject.toml # Project metadata
160
+ β”œβ”€β”€ pytest.ini # Test configuration
161
+ └── setup_env.py # Environment setup
162
  ```
163
 
164
  ## πŸ§ͺ Testing
README_hf.md CHANGED
@@ -8,30 +8,45 @@ colorTo: blue
8
  pinned: true
9
  short_description: A powerful web content crawler with LLM-powered RAG.
10
  ---
11
- # CRAWLGPT πŸ€–
12
 
13
- A powerful web content crawler with LLM-powered summarization and chat capabilities. CRAWLGPT extracts content from URLs, stores it in a vector database (FAISS), and enables natural language querying of the stored content. It combines modern web crawling technology with advanced language models to help you extract, analyze, and interact with web content intelligently.
14
 
15
- ## 🌟 Features
16
 
17
- - **Web Crawling**
18
- Async-based crawling powered by [crawl4ai](https://pypi.org/project/crawl4ai/) and Playwright.
19
- Includes configurable rate limiting and content validation.
20
-
21
- - **Content Processing**
22
- Automatically chunks large texts, generates embeddings, and summarizes text via the Groq API.
23
-
24
- - **Chat Interface**
25
- Streamlit-based UI with a user-friendly chat panel.
26
- Supports summarized or full-text retrieval (RAG) for context injection.
27
-
28
- - **Data Management**
29
- Stores content in a local or in-memory vector database (FAISS) for efficient retrieval.
30
- Tracks usage metrics and supports import/export of system state.
31
-
32
- - **Testing**
33
- Comprehensive unit and integration tests using Python’s `unittest` framework.
34
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## πŸŽ₯ Demo
37
  ### [Deployed APP πŸš€πŸ€–](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
@@ -119,35 +134,41 @@ _Example of CRAWLGPT in action!_
119
  crawlgpt/
120
  β”œβ”€β”€ src/
121
  β”‚ └── crawlgpt/
122
- β”‚ β”œβ”€β”€ core/
123
- β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py
124
- β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py
125
- β”‚ β”‚ └── SummaryGenerator.py
126
- β”‚ β”œβ”€β”€ ui/
127
- β”‚ β”‚ β”œβ”€β”€ chat_app.py
128
- β”‚ β”‚ └── chat_ui.py
129
- β”‚ └── utils/
130
- β”‚ β”œβ”€β”€ content_validator.py
131
- β”‚ β”œβ”€β”€ data_manager.py
132
- β”‚ β”œβ”€β”€ helper_functions.py
133
- β”‚ β”œβ”€β”€ monitoring.py
134
- β”‚ └── progress.py
135
- β”œβ”€β”€ tests/
 
 
136
  β”‚ └── test_core/
137
- β”‚ β”œβ”€β”€ test_database_handler.py
138
- β”‚ β”œβ”€β”€ test_integration.py
139
- β”‚ β”œβ”€β”€ test_llm_based_crawler.py
140
- β”‚ └── test_summary_generator.py
141
- β”œβ”€β”€ .github/
142
  β”‚ └── workflows/
143
- β”‚ └── Push_to_hf.yaml
144
- β”œβ”€β”€ .gitignore
145
- β”œβ”€β”€ LICENSE
146
- β”œβ”€β”€ README.md
147
- β”œβ”€οΏ½οΏ½οΏ½ Docs
148
- β”œβ”€β”€ pyproject.toml
149
- β”œβ”€β”€ pytest.ini
150
- └── setup_env.py
 
 
 
 
151
  ```
152
 
153
  ## πŸ§ͺ Testing
 
8
  pinned: true
9
  short_description: A powerful web content crawler with LLM-powered RAG.
10
  ---
11
+ # CrawlGPT πŸ€–
12
 
13
+ A powerful web content crawler with LLM-powered RAG (Retrieval Augmented Generation) capabilities. CrawlGPT extracts content from URLs, processes it through intelligent summarization, and enables natural language interactions using modern LLM technology.
14
 
15
+ ## 🌟 Key Features
16
 
17
+ ### Core Features
18
+ - **Intelligent Web Crawling**
19
+ - Async web content extraction using Playwright
20
+ - Smart rate limiting and validation
21
+ - Configurable crawling strategies
22
+
23
+ - **Advanced Content Processing**
24
+ - Automatic text chunking and summarization
25
+ - Vector embeddings via FAISS
26
+ - Context-aware response generation
27
+
28
+ - **Streamlit Chat Interface**
29
+ - Clean, responsive UI
30
+ - Real-time content processing
31
+ - Conversation history
32
+ - User authentication
33
+
34
+ ### Technical Features
35
+ - **Vector Database**
36
+ - FAISS-powered similarity search
37
+ - Efficient content retrieval
38
+ - Persistent storage
39
+
40
+ - **User Management**
41
+ - SQLite database backend
42
+ - Secure password hashing
43
+ - Chat history tracking
44
+
45
+ - **Monitoring & Utils**
46
+ - Request metrics collection
47
+ - Progress tracking
48
+ - Data import/export
49
+ - Content validation
50
 
51
  ## πŸŽ₯ Demo
52
  ### [Deployed APP πŸš€πŸ€–](https://huggingface.co/spaces/jatinmehra/CRAWL-GPT-CHAT)
 
134
  crawlgpt/
135
  β”œβ”€β”€ src/
136
  β”‚ └── crawlgpt/
137
+ β”‚ β”œβ”€β”€ core/ # Core functionality
138
+ β”‚ β”‚ β”œβ”€β”€ database.py # SQL database handling
139
+ β”‚ β”‚ β”œβ”€β”€ LLMBasedCrawler.py # Main crawler implementation
140
+ β”‚ β”‚ β”œβ”€β”€ DatabaseHandler.py # Vector database (FAISS)
141
+ β”‚ β”‚ └── SummaryGenerator.py # Text summarization
142
+ β”‚ β”œβ”€β”€ ui/ # User Interface
143
+ β”‚ β”‚ β”œβ”€β”€ chat_app.py # Main Streamlit app
144
+ β”‚ β”‚ β”œβ”€β”€ chat_ui.py # Development UI
145
+ β”‚ β”‚ └── login.py # Authentication UI
146
+ β”‚ └── utils/ # Utilities
147
+ β”‚ β”œβ”€β”€ content_validator.py # URL/content validation
148
+ β”‚ β”œβ”€β”€ data_manager.py # Import/export handling
149
+ β”‚ β”œβ”€β”€ helper_functions.py # General helpers
150
+ β”‚ β”œβ”€β”€ monitoring.py # Metrics collection
151
+ β”‚ └── progress.py # Progress tracking
152
+ β”œβ”€β”€ tests/ # Test suite
153
  β”‚ └── test_core/
154
+ β”‚ β”œβ”€β”€ test_database_handler.py # Vector DB tests
155
+ β”‚ β”œβ”€β”€ test_integration.py # Integration tests
156
+ β”‚ β”œβ”€β”€ test_llm_based_crawler.py # Crawler tests
157
+ β”‚ └── test_summary_generator.py # Summarizer tests
158
+ β”œβ”€β”€ .github/ # CI/CD
159
  β”‚ └── workflows/
160
+ β”‚ └── Push_to_hf.yaml # HuggingFace sync
161
+ β”œβ”€β”€ Docs/
162
+ β”‚ └── MiniDoc.md # Documentation
163
+ β”œβ”€β”€ .dockerignore # Docker exclusions
164
+ β”œβ”€β”€ .gitignore # Git exclusions
165
+ β”œβ”€β”€ Dockerfile # Container config
166
+ β”œβ”€β”€ LICENSE # MIT License
167
+ β”œβ”€β”€ README.md # Project documentation
168
+ β”œβ”€β”€ README_hf.md # HuggingFace README
169
+ β”œβ”€β”€ pyproject.toml # Project metadata
170
+ β”œβ”€β”€ pytest.ini # Test configuration
171
+ └── setup_env.py # Environment setup
172
  ```
173
 
174
  ## πŸ§ͺ Testing
src/crawlgpt/core/database.py CHANGED
@@ -1,4 +1,6 @@
1
- # crawlgpt/src/crawlgpt/core/database.py
 
 
2
  from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, ForeignKey
3
  from sqlalchemy.ext.declarative import declarative_base
4
  from sqlalchemy.orm import sessionmaker, relationship
@@ -6,10 +8,23 @@ from datetime import datetime
6
  from passlib.context import CryptContext
7
  import os
8
 
 
9
  Base = declarative_base()
 
 
10
  pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
11
 
12
  class User(Base):
 
 
 
 
 
 
 
 
 
 
13
  __tablename__ = 'users'
14
  id = Column(Integer, primary_key=True)
15
  username = Column(String(50), unique=True)
@@ -19,6 +34,17 @@ class User(Base):
19
  chats = relationship("ChatHistory", back_populates="user")
20
 
21
  class ChatHistory(Base):
 
 
 
 
 
 
 
 
 
 
 
22
  __tablename__ = 'chat_history'
23
  id = Column(Integer, primary_key=True)
24
  user_id = Column(Integer, ForeignKey('users.id'))
@@ -28,12 +54,22 @@ class ChatHistory(Base):
28
  timestamp = Column(DateTime, default=datetime.utcnow)
29
  user = relationship("User", back_populates="chats")
30
 
 
31
  engine = create_engine(os.getenv('DATABASE_URL', 'sqlite:///crawlgpt.db'))
32
  Base.metadata.create_all(bind=engine)
33
  Session = sessionmaker(bind=engine)
34
 
35
  # Database operations
36
  def create_user(username: str, password: str, email: str):
 
 
 
 
 
 
 
 
 
37
  with Session() as session:
38
  if session.query(User).filter(User.username == username).first():
39
  return False
@@ -44,6 +80,14 @@ def create_user(username: str, password: str, email: str):
44
  return True
45
 
46
  def authenticate_user(username: str, password: str):
 
 
 
 
 
 
 
 
47
  with Session() as session:
48
  user = session.query(User).filter(User.username == username).first()
49
  if user and pwd_context.verify(password, user.password_hash):
@@ -51,6 +95,17 @@ def authenticate_user(username: str, password: str):
51
  return None
52
 
53
  def save_chat_message(user_id: int, message: str, role: str, context: str):
 
 
 
 
 
 
 
 
 
 
 
54
  with Session() as session:
55
  chat = ChatHistory(
56
  user_id=user_id,
@@ -62,12 +117,27 @@ def save_chat_message(user_id: int, message: str, role: str, context: str):
62
  session.commit()
63
 
64
  def get_chat_history(user_id: int):
 
 
 
 
 
 
 
 
65
  with Session() as session:
66
  return session.query(ChatHistory).filter(
67
  ChatHistory.user_id == user_id
68
  ).order_by(ChatHistory.timestamp).all()
69
 
70
  def delete_user_chat_history(user_id: int):
 
 
 
 
 
 
 
71
  with Session() as session:
72
  session.query(ChatHistory).filter(
73
  ChatHistory.user_id == user_id
@@ -75,7 +145,19 @@ def delete_user_chat_history(user_id: int):
75
  session.commit()
76
 
77
  def restore_chat_history(user_id: int):
78
- """Restores chat history from database to session state"""
 
 
 
 
 
 
 
 
 
 
 
 
79
  with Session() as session:
80
  history = session.query(ChatHistory).filter(
81
  ChatHistory.user_id == user_id
 
1
+ # This module provides SQLAlchemy models and database utilities for user management
2
+ # and chat history persistence.
3
+
4
  from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, ForeignKey
5
  from sqlalchemy.ext.declarative import declarative_base
6
  from sqlalchemy.orm import sessionmaker, relationship
 
8
  from passlib.context import CryptContext
9
  import os
10
 
11
+ # SQLAlchemy models
12
  Base = declarative_base()
13
+
14
+ # Password hashing
15
  pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
16
 
17
  class User(Base):
18
+ """User model for authentication and chat history tracking.
19
+
20
+ Attributes:
21
+ id (int): Primary key
22
+ username (str): Unique username, max 50 chars
23
+ password_hash (str): BCrypt hashed password, 60 chars
24
+ email (str): User email, max 100 chars
25
+ created_at (datetime): Account creation timestamp
26
+ chats (relationship): One-to-many relationship to ChatHistory
27
+ """
28
  __tablename__ = 'users'
29
  id = Column(Integer, primary_key=True)
30
  username = Column(String(50), unique=True)
 
34
  chats = relationship("ChatHistory", back_populates="user")
35
 
36
  class ChatHistory(Base):
37
+ """ChatHistory model for storing chat messages.
38
+
39
+ Attributes:
40
+ id (int): Primary key
41
+ user_id (int): Foreign key to User
42
+ message (str): Chat message content
43
+ role (str): Role of the message sender ('user' or 'assistant')
44
+ context (str): Context of the chat message
45
+ timestamp (datetime): Timestamp of the message
46
+ user (relationship): Many-to-one relationship to User
47
+ """
48
  __tablename__ = 'chat_history'
49
  id = Column(Integer, primary_key=True)
50
  user_id = Column(Integer, ForeignKey('users.id'))
 
54
  timestamp = Column(DateTime, default=datetime.utcnow)
55
  user = relationship("User", back_populates="chats")
56
 
57
+ # Database initialization
58
  engine = create_engine(os.getenv('DATABASE_URL', 'sqlite:///crawlgpt.db'))
59
  Base.metadata.create_all(bind=engine)
60
  Session = sessionmaker(bind=engine)
61
 
62
  # Database operations
63
  def create_user(username: str, password: str, email: str):
64
+ """
65
+ Creates a new user in the database
66
+ Args:
67
+ username (str): Username
68
+ password (str): Password
69
+ email (str): Email
70
+ Returns:
71
+ bool: True if user is created, False if username is taken
72
+ """
73
  with Session() as session:
74
  if session.query(User).filter(User.username == username).first():
75
  return False
 
80
  return True
81
 
82
  def authenticate_user(username: str, password: str):
83
+ """
84
+ Authenticates a user with a username and password
85
+ Args:
86
+ username (str): Username
87
+ password (str): Password
88
+ Returns:
89
+ User: User object if authentication is successful, None otherwise
90
+ """
91
  with Session() as session:
92
  user = session.query(User).filter(User.username == username).first()
93
  if user and pwd_context.verify(password, user.password_hash):
 
95
  return None
96
 
97
  def save_chat_message(user_id: int, message: str, role: str, context: str):
98
+ """Saves a chat message to the database
99
+
100
+ Args:
101
+ user_id (int): User ID
102
+ message (str): Chat message content
103
+ role (str): Role of the message sender ('user' or 'assistant')
104
+ context (str): Context of the chat message
105
+
106
+ Returns:
107
+ None
108
+ """
109
  with Session() as session:
110
  chat = ChatHistory(
111
  user_id=user_id,
 
117
  session.commit()
118
 
119
  def get_chat_history(user_id: int):
120
+ """
121
+ Retrieves chat history for a user
122
+ Args:
123
+ user_id (int): User ID
124
+
125
+ Returns:
126
+ List[ChatHistory]: List of chat messages
127
+ """
128
  with Session() as session:
129
  return session.query(ChatHistory).filter(
130
  ChatHistory.user_id == user_id
131
  ).order_by(ChatHistory.timestamp).all()
132
 
133
  def delete_user_chat_history(user_id: int):
134
+ """Deletes all chat history for a user
135
+ Args:
136
+ user_id (int): User ID
137
+
138
+ Returns:
139
+ None
140
+ """
141
  with Session() as session:
142
  session.query(ChatHistory).filter(
143
  ChatHistory.user_id == user_id
 
145
  session.commit()
146
 
147
  def restore_chat_history(user_id: int):
148
+ """Restores chat history from database to session state
149
+ Args:
150
+ user_id (int): User ID
151
+
152
+ Returns:
153
+ List[Dict]: List of chat messages in the format:
154
+ {
155
+ "role": str,
156
+ "content": str,
157
+ "context": str,
158
+ "timestamp": datetime
159
+ }
160
+ """
161
  with Session() as session:
162
  history = session.query(ChatHistory).filter(
163
  ChatHistory.user_id == user_id