Spaces:
Running
Running
Jatin Mehra
commited on
Update MiniDoc
Browse files- Docs/MiniDoc.md +17 -17
Docs/MiniDoc.md
CHANGED
@@ -40,20 +40,20 @@ crawlgpt/
|
|
40 |
|
41 |
## Core Components
|
42 |
|
43 |
-
### LLMBasedCrawler (src/crawlgpt/core/LLMBasedCrawler.py)
|
44 |
|
45 |
- Main crawler class handling web content extraction and processing
|
46 |
- Integrates with Groq API for language model operations
|
47 |
- Manages content chunking, summarization and response generation
|
48 |
- Includes rate limiting and metrics collection
|
49 |
|
50 |
-
### DatabaseHandler (src/crawlgpt/core/DatabaseHandler.py)
|
51 |
|
52 |
- Vector database implementation using FAISS
|
53 |
- Stores and retrieves text embeddings for efficient similarity search
|
54 |
- Handles data persistence and state management
|
55 |
|
56 |
-
### SummaryGenerator (src/crawlgpt/core/SummaryGenerator.py)
|
57 |
|
58 |
- Generates concise summaries of text chunks using Groq API
|
59 |
- Configurable model selection and parameters
|
@@ -61,7 +61,7 @@ crawlgpt/
|
|
61 |
|
62 |
## UI Components
|
63 |
|
64 |
-
### [chat_app.py](https://
|
65 |
|
66 |
- Main Streamlit application interface
|
67 |
- URL processing and content extraction
|
@@ -69,7 +69,7 @@ crawlgpt/
|
|
69 |
- System metrics and debug information
|
70 |
- Import/export functionality
|
71 |
|
72 |
-
### [chat_ui.py](https://
|
73 |
|
74 |
- Development/testing UI with additional debug features
|
75 |
- Extended metrics visualization
|
@@ -77,28 +77,28 @@ crawlgpt/
|
|
77 |
|
78 |
## Utilities
|
79 |
|
80 |
-
### [content_validator.py](https://
|
81 |
|
82 |
- URL and content validation
|
83 |
- MIME type checking
|
84 |
- Size limit enforcement
|
85 |
- Security checks for malicious content
|
86 |
|
87 |
-
### [data_manager.py](https://
|
88 |
|
89 |
- Data import/export operations
|
90 |
- File serialization (JSON/pickle)
|
91 |
- Timestamped backups
|
92 |
- State management
|
93 |
|
94 |
-
### [monitoring.py](https://
|
95 |
|
96 |
- Request metrics collection
|
97 |
- Rate limiting implementation
|
98 |
- Performance monitoring
|
99 |
- Usage statistics
|
100 |
|
101 |
-
### [progress.py](https://
|
102 |
|
103 |
- Operation progress tracking
|
104 |
- Status updates
|
@@ -107,25 +107,25 @@ crawlgpt/
|
|
107 |
|
108 |
## Testing
|
109 |
|
110 |
-
### [test_database_handler.py](https://
|
111 |
|
112 |
- Tests for vector database operations
|
113 |
- Integration tests for data storage/retrieval
|
114 |
- End-to-end flow validation
|
115 |
|
116 |
-
### [test_integration.py](https://
|
117 |
|
118 |
- Full system integration tests
|
119 |
- URL extraction to response generation flow
|
120 |
- State management validation
|
121 |
|
122 |
-
### [test_llm_based_crawler.py](https://
|
123 |
|
124 |
- Crawler functionality tests
|
125 |
- Content extraction validation
|
126 |
- Response generation testing
|
127 |
|
128 |
-
### [test_summary_generator.py](https://
|
129 |
|
130 |
- Summary generation tests
|
131 |
- Empty input handling
|
@@ -133,21 +133,21 @@ crawlgpt/
|
|
133 |
|
134 |
## Configuration
|
135 |
|
136 |
-
### [pyproject.toml](https://
|
137 |
|
138 |
- Project metadata
|
139 |
- Dependencies
|
140 |
- Optional dev dependencies
|
141 |
- Entry points
|
142 |
|
143 |
-
### [pytest.ini](https://
|
144 |
|
145 |
- Test configuration
|
146 |
- Path settings
|
147 |
- Test discovery patterns
|
148 |
- Reporting options
|
149 |
|
150 |
-
### [setup_env.py](https://
|
151 |
|
152 |
- Environment setup script
|
153 |
- Virtual environment creation
|
@@ -211,4 +211,4 @@ Development:
|
|
211 |
|
212 |
## License
|
213 |
|
214 |
-
MIT License
|
|
|
40 |
|
41 |
## Core Components
|
42 |
|
43 |
+
### [LLMBasedCrawler](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/LLMBasedCrawler.py) (src/crawlgpt/core/LLMBasedCrawler.py)
|
44 |
|
45 |
- Main crawler class handling web content extraction and processing
|
46 |
- Integrates with Groq API for language model operations
|
47 |
- Manages content chunking, summarization and response generation
|
48 |
- Includes rate limiting and metrics collection
|
49 |
|
50 |
+
### [DatabaseHandler](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/DatabaseHandler.py) (src/crawlgpt/core/DatabaseHandler.py)
|
51 |
|
52 |
- Vector database implementation using FAISS
|
53 |
- Stores and retrieves text embeddings for efficient similarity search
|
54 |
- Handles data persistence and state management
|
55 |
|
56 |
+
### [SummaryGenerator](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/core/SummaryGenerator.py) (src/crawlgpt/core/SummaryGenerator.py)
|
57 |
|
58 |
- Generates concise summaries of text chunks using Groq API
|
59 |
- Configurable model selection and parameters
|
|
|
61 |
|
62 |
## UI Components
|
63 |
|
64 |
+
### [chat_app.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_app.py) (src/crawlgpt/ui/chat_app.py)
|
65 |
|
66 |
- Main Streamlit application interface
|
67 |
- URL processing and content extraction
|
|
|
69 |
- System metrics and debug information
|
70 |
- Import/export functionality
|
71 |
|
72 |
+
### [chat_ui.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/ui/chat_ui.py) (src/crawlgpt/ui/chat_ui.py)
|
73 |
|
74 |
- Development/testing UI with additional debug features
|
75 |
- Extended metrics visualization
|
|
|
77 |
|
78 |
## Utilities
|
79 |
|
80 |
+
### [content_validator.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/content_validator.py) (src/crawlgpt/utils/content_validator.py)
|
81 |
|
82 |
- URL and content validation
|
83 |
- MIME type checking
|
84 |
- Size limit enforcement
|
85 |
- Security checks for malicious content
|
86 |
|
87 |
+
### [data_manager.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/data_manager.py) (src/crawlgpt/utils/data_manager.py)
|
88 |
|
89 |
- Data import/export operations
|
90 |
- File serialization (JSON/pickle)
|
91 |
- Timestamped backups
|
92 |
- State management
|
93 |
|
94 |
+
### [monitoring.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/monitoring.py) (src/crawlgpt/utils/monitoring.py)
|
95 |
|
96 |
- Request metrics collection
|
97 |
- Rate limiting implementation
|
98 |
- Performance monitoring
|
99 |
- Usage statistics
|
100 |
|
101 |
+
### [progress.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/src/crawlgpt/utils/progress.py) (src/crawlgpt/utils/progress.py)
|
102 |
|
103 |
- Operation progress tracking
|
104 |
- Status updates
|
|
|
107 |
|
108 |
## Testing
|
109 |
|
110 |
+
### [test_database_handler.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_database_handler.py) (tests/test_core/test_database_handler.py)
|
111 |
|
112 |
- Tests for vector database operations
|
113 |
- Integration tests for data storage/retrieval
|
114 |
- End-to-end flow validation
|
115 |
|
116 |
+
### [test_integration.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_integration.py) (tests/test_core/test_integration.py)
|
117 |
|
118 |
- Full system integration tests
|
119 |
- URL extraction to response generation flow
|
120 |
- State management validation
|
121 |
|
122 |
+
### [test_llm_based_crawler.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_llm_based_crawler.py) (tests/test_core/test_llm_based_crawler.py)
|
123 |
|
124 |
- Crawler functionality tests
|
125 |
- Content extraction validation
|
126 |
- Response generation testing
|
127 |
|
128 |
+
### [test_summary_generator.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/tests/test_core/test_summary_generator.py) (tests/test_core/test_summary_generator.py)
|
129 |
|
130 |
- Summary generation tests
|
131 |
- Empty input handling
|
|
|
133 |
|
134 |
## Configuration
|
135 |
|
136 |
+
### [pyproject.toml](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/pyproject.toml)
|
137 |
|
138 |
- Project metadata
|
139 |
- Dependencies
|
140 |
- Optional dev dependencies
|
141 |
- Entry points
|
142 |
|
143 |
+
### [pytest.ini](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/pytest.ini)
|
144 |
|
145 |
- Test configuration
|
146 |
- Path settings
|
147 |
- Test discovery patterns
|
148 |
- Reporting options
|
149 |
|
150 |
+
### [setup_env.py](https://github.com/Jatin-Mehra119/CRAWLGPT/blob/main/setup_env.py)
|
151 |
|
152 |
- Environment setup script
|
153 |
- Virtual environment creation
|
|
|
211 |
|
212 |
## License
|
213 |
|
214 |
+
MIT License
|