Ultronprime commited on
Commit
a1b5712
·
verified ·
1 Parent(s): b2eeb78

Upload CLAUDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. CLAUDE.md +108 -0
CLAUDE.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Development Guidelines
2
+
3
+ ## Build & Test Commands
4
+ ```
5
+ # Install dependencies
6
+ pip install -r requirements.txt
7
+ pip install -r requirements-test.txt
8
+
9
+ # Run linting
10
+ python -m ruff check .
11
+
12
+ # Run formatting
13
+ python -m ruff format .
14
+
15
+ # Type checking
16
+ python -m mypy .
17
+
18
+ # Run a specific test
19
+ python -m pytest test_e2e.py -v
20
+
21
+ # Run a specific test function
22
+ python -m pytest test_e2e.py::test_end_to_end -v
23
+
24
+ # Deploy to Cloud Run
25
+ ./deploy_rag.sh --project=YOUR_PROJECT_ID --region=YOUR_REGION
26
+
27
+ # Local development
28
+ python app.py
29
+ ```
30
+
31
+ ## Code Style
32
+ - **Line Length**: 100 characters max (defined in pyproject.toml)
33
+ - **Docstrings**: Google style docstrings required (follow existing patterns)
34
+ - **Type Hints**: Required for all function parameters and return values
35
+ - **Imports**: Group standard lib, third-party, then local imports with blank lines between
36
+ - **Error Handling**: Use specific exception types with logging
37
+ - **Linters**: Ruff for linting (F, E, W, D, N, C, B, Q, A rules)
38
+ - **Naming**: snake_case for variables/functions, CamelCase for classes
39
+ - **Environment Variables**: Use os.environ.get() with defaults when appropriate
40
+
41
+ ## Architecture
42
+ - Flask web application for serving RAG queries
43
+ - Google Cloud services: BigQuery, Vertex AI, DocumentAI, Cloud Storage
44
+ - Cloud Functions triggered by GCS events
45
+ - Cloud Run for serving the web application
46
+
47
+ ## Hugging Face Implementation Plan
48
+
49
+ ### Repository Link
50
+ - GitHub: https://github.com/YOUR_USERNAME/cloud-rag-webhook
51
+
52
+ ### Migration Steps
53
+ 1. Create a new Hugging Face Space with Docker SDK
54
+ 2. Enable Dev Mode for VS Code access
55
+ 3. Clone the GitHub repository
56
+ 4. Set up environment variables for secrets
57
+ 5. Configure persistent storage (20GB purchased)
58
+
59
+ ### Running on Hugging Face
60
+ 1. Configure Space to always stay running (persistent execution)
61
+ 2. Use "Secrets" in Space settings for API keys and credentials
62
+ 3. Set up scheduled tasks with GitHub Actions for:
63
+ - Processing files (daily)
64
+ - Backing up code (every 6 hours)
65
+
66
+ ### Implementation Details
67
+ 1. **File Storage**:
68
+ - Store input files in Hugging Face's persistent storage
69
+ - Use Hugging Face Datasets for managing processed data
70
+
71
+ 2. **Process Automation**:
72
+ - For "under the hood" processing:
73
+ - Configure Space to run continuously
74
+ - Set up GitHub Actions for scheduled tasks
75
+ - Use Docker health checks to ensure service stays alive
76
+
77
+ 3. **Deployment Architecture**:
78
+ - Hugging Face Space = Cloud Run equivalent
79
+ - Space will run the server continuously
80
+ - Configure autoscaling in the Dockerfile settings
81
+
82
+ ### Key Files
83
+ - `auto_process_bucket.py`: Batch file processor
84
+ - `process_text.py`: Individual file processor
85
+ - `rag_query.py`: Query interface
86
+ - `app.py`: Web application
87
+ - `auto_backup.sh`: GitHub backup script
88
+ - `setup_all.sh`: Complete setup script
89
+
90
+ ### Required Environment Variables
91
+ - `GOOGLE_APPLICATION_CREDENTIALS`: Google Cloud credentials
92
+ - `PROJECT_ID`: Google Cloud project ID
93
+ - `BUCKET_NAME`: GCS bucket name
94
+ - `GITHUB_TOKEN`: For GitHub access
95
+ - `HF_TOKEN`: For Hugging Face API access
96
+
97
+ ### Hugging Face Specific Updates
98
+ - Update Dockerfile for Hugging Face compatibility
99
+ - Create Space UI in `app.py` using Gradio or Streamlit
100
+ - Use Hugging Face Datasets API in addition to BigQuery
101
+
102
+ ## Project Goal
103
+ Create an automated RAG system that:
104
+ 1. Automatically processes text/PDF files
105
+ 2. Runs continuously "under the hood"
106
+ 3. Provides a simple query interface
107
+ 4. Backs up all code and data
108
+ 5. Requires minimal maintenance