Jatin Mehra commited on
Commit
482c230
·
unverified ·
1 Parent(s): 65faf21

Update MiniDoc

Browse files
Files changed (1) hide show
  1. Docs/MiniDoc.md +17 -17
Docs/MiniDoc.md CHANGED
@@ -40,20 +40,20 @@ crawlgpt/
40
 
41
  ## Core Components
42
 
43
- ### LLMBasedCrawler (src/crawlgpt/core/LLMBasedCrawler.py)
44
 
45
  - Main crawler class handling web content extraction and processing
46
  - Integrates with Groq API for language model operations
47
  - Manages content chunking, summarization and response generation
48
  - Includes rate limiting and metrics collection
49
 
50
- ### DatabaseHandler (src/crawlgpt/core/DatabaseHandler.py)
51
 
52
  - Vector database implementation using FAISS
53
  - Stores and retrieves text embeddings for efficient similarity search
54
  - Handles data persistence and state management
55
 
56
- ### SummaryGenerator (src/crawlgpt/core/SummaryGenerator.py)
57
 
58
  - Generates concise summaries of text chunks using Groq API
59
  - Configurable model selection and parameters
@@ -61,7 +61,7 @@ crawlgpt/
61
 
62
  ## UI Components
63
 
64
- ### [chat_app.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT) (src/crawlgpt/ui/chat_app.py)
65
 
66
  - Main Streamlit application interface
67
  - URL processing and content extraction
@@ -69,7 +69,7 @@ crawlgpt/
69
  - System metrics and debug information
70
  - Import/export functionality
71
 
72
- ### [chat_ui.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT) (src/crawlgpt/ui/chat_ui.py)
73
 
74
  - Development/testing UI with additional debug features
75
  - Extended metrics visualization
@@ -77,28 +77,28 @@ crawlgpt/
77
 
78
  ## Utilities
79
 
80
- ### [content_validator.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
81
 
82
  - URL and content validation
83
  - MIME type checking
84
  - Size limit enforcement
85
  - Security checks for malicious content
86
 
87
- ### [data_manager.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
88
 
89
  - Data import/export operations
90
  - File serialization (JSON/pickle)
91
  - Timestamped backups
92
  - State management
93
 
94
- ### [monitoring.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
95
 
96
  - Request metrics collection
97
  - Rate limiting implementation
98
  - Performance monitoring
99
  - Usage statistics
100
 
101
- ### [progress.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
102
 
103
  - Operation progress tracking
104
  - Status updates
@@ -107,25 +107,25 @@ crawlgpt/
107
 
108
  ## Testing
109
 
110
- ### [test_database_handler.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
111
 
112
  - Tests for vector database operations
113
  - Integration tests for data storage/retrieval
114
  - End-to-end flow validation
115
 
116
- ### [test_integration.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
117
 
118
  - Full system integration tests
119
  - URL extraction to response generation flow
120
  - State management validation
121
 
122
- ### [test_llm_based_crawler.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
123
 
124
  - Crawler functionality tests
125
  - Content extraction validation
126
  - Response generation testing
127
 
128
- ### [test_summary_generator.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
129
 
130
  - Summary generation tests
131
  - Empty input handling
@@ -133,21 +133,21 @@ crawlgpt/
133
 
134
  ## Configuration
135
 
136
- ### [pyproject.toml](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
137
 
138
  - Project metadata
139
  - Dependencies
140
  - Optional dev dependencies
141
  - Entry points
142
 
143
- ### [pytest.ini](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
144
 
145
  - Test configuration
146
  - Path settings
147
  - Test discovery patterns
148
  - Reporting options
149
 
150
- ### [setup_env.py](https://orange-memory-g4xp5wqvqvr4hrvx.github.dev/?folder=%2Fworkspaces%2FCRAWLGPT)
151
 
152
  - Environment setup script
153
  - Virtual environment creation
@@ -211,4 +211,4 @@ Development:
211
 
212
  ## License
213
 
214
- MIT License
 
40
 
41
  ## Core Components
42
 
43
+ ### [LLMBasedCrawler](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/LLMBasedCrawler.py) (src/crawlgpt/core/LLMBasedCrawler.py)
44
 
45
  - Main crawler class handling web content extraction and processing
46
  - Integrates with Groq API for language model operations
47
  - Manages content chunking, summarization and response generation
48
  - Includes rate limiting and metrics collection
49
 
50
+ ### [DatabaseHandler](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/DatabaseHandler.py) (src/crawlgpt/core/DatabaseHandler.py)
51
 
52
  - Vector database implementation using FAISS
53
  - Stores and retrieves text embeddings for efficient similarity search
54
  - Handles data persistence and state management
55
 
56
+ ### [SummaryGenerator](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/SummaryGenerator.py) (src/crawlgpt/core/SummaryGenerator.py)
57
 
58
  - Generates concise summaries of text chunks using Groq API
59
  - Configurable model selection and parameters
 
61
 
62
  ## UI Components
63
 
64
+ ### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
65
 
66
  - Main Streamlit application interface
67
  - URL processing and content extraction
 
69
  - System metrics and debug information
70
  - Import/export functionality
71
 
72
+ ### [chat_ui.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_ui.py) (src/crawlgpt/ui/chat_ui.py)
73
 
74
  - Development/testing UI with additional debug features
75
  - Extended metrics visualization
 
77
 
78
  ## Utilities
79
 
80
+ ### [content_validator.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/content_validator.py) (src/crawlgpt/utils/content_validator.py)
81
 
82
  - URL and content validation
83
  - MIME type checking
84
  - Size limit enforcement
85
  - Security checks for malicious content
86
 
87
+ ### [data_manager.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/data_manager.py) (src/crawlgpt/utils/data_manager.py)
88
 
89
  - Data import/export operations
90
  - File serialization (JSON/pickle)
91
  - Timestamped backups
92
  - State management
93
 
94
+ ### [monitoring.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/monitoring.py) (src/crawlgpt/utils/monitoring.py)
95
 
96
  - Request metrics collection
97
  - Rate limiting implementation
98
  - Performance monitoring
99
  - Usage statistics
100
 
101
+ ### [progress.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/progress.py) (src/crawlgpt/utils/progress.py)
102
 
103
  - Operation progress tracking
104
  - Status updates
 
107
 
108
  ## Testing
109
 
110
+ ### [test_database_handler.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_database_handler.py) (tests/test_core/test_database_handler.py)
111
 
112
  - Tests for vector database operations
113
  - Integration tests for data storage/retrieval
114
  - End-to-end flow validation
115
 
116
+ ### [test_integration.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_integration.py) (tests/test_core/test_integration.py)
117
 
118
  - Full system integration tests
119
  - URL extraction to response generation flow
120
  - State management validation
121
 
122
+ ### [test_llm_based_crawler.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_llm_based_crawler.py) (tests/test_core/test_llm_based_crawler.py)
123
 
124
  - Crawler functionality tests
125
  - Content extraction validation
126
  - Response generation testing
127
 
128
+ ### [test_summary_generator.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_summary_generator.py) (tests/test_core/test_summary_generator.py)
129
 
130
  - Summary generation tests
131
  - Empty input handling
 
133
 
134
  ## Configuration
135
 
136
+ ### [pyproject.toml](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/pyproject.toml)
137
 
138
  - Project metadata
139
  - Dependencies
140
  - Optional dev dependencies
141
  - Entry points
142
 
143
+ ### [pytest.ini](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/pytest.ini)
144
 
145
  - Test configuration
146
  - Path settings
147
  - Test discovery patterns
148
  - Reporting options
149
 
150
+ ### [setup_env.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/setup_env.py)
151
 
152
  - Environment setup script
153
  - Virtual environment creation
 
211
 
212
  ## License
213
 
214
+ MIT License