Spaces:
Runtime error
Runtime error
Upload CLAUDE.md with huggingface_hub
Browse files
CLAUDE.md
ADDED
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Development Guidelines
|
2 |
+
|
3 |
+
## Build & Test Commands
|
4 |
+
```
|
5 |
+
# Install dependencies
|
6 |
+
pip install -r requirements.txt
|
7 |
+
pip install -r requirements-test.txt
|
8 |
+
|
9 |
+
# Run linting
|
10 |
+
python -m ruff check .
|
11 |
+
|
12 |
+
# Run formatting
|
13 |
+
python -m ruff format .
|
14 |
+
|
15 |
+
# Type checking
|
16 |
+
python -m mypy .
|
17 |
+
|
18 |
+
# Run a specific test
|
19 |
+
python -m pytest test_e2e.py -v
|
20 |
+
|
21 |
+
# Run a specific test function
|
22 |
+
python -m pytest test_e2e.py::test_end_to_end -v
|
23 |
+
|
24 |
+
# Deploy to Cloud Run
|
25 |
+
./deploy_rag.sh --project=YOUR_PROJECT_ID --region=YOUR_REGION
|
26 |
+
|
27 |
+
# Local development
|
28 |
+
python app.py
|
29 |
+
```
|
30 |
+
|
31 |
+
## Code Style
|
32 |
+
- **Line Length**: 100 characters max (defined in pyproject.toml)
|
33 |
+
- **Docstrings**: Google style docstrings required (follow existing patterns)
|
34 |
+
- **Type Hints**: Required for all function parameters and return values
|
35 |
+
- **Imports**: Group standard lib, third-party, then local imports with blank lines between
|
36 |
+
- **Error Handling**: Use specific exception types with logging
|
37 |
+
- **Linters**: Ruff for linting (F, E, W, D, N, C, B, Q, A rules)
|
38 |
+
- **Naming**: snake_case for variables/functions, CamelCase for classes
|
39 |
+
- **Environment Variables**: Use os.environ.get() with defaults when appropriate
|
40 |
+
|
41 |
+
## Architecture
|
42 |
+
- Flask web application for serving RAG queries
|
43 |
+
- Google Cloud services: BigQuery, Vertex AI, DocumentAI, Cloud Storage
|
44 |
+
- Cloud Functions triggered by GCS events
|
45 |
+
- Cloud Run for serving the web application
|
46 |
+
|
47 |
+
## Hugging Face Implementation Plan
|
48 |
+
|
49 |
+
### Repository Link
|
50 |
+
- GitHub: https://github.com/YOUR_USERNAME/cloud-rag-webhook
|
51 |
+
|
52 |
+
### Migration Steps
|
53 |
+
1. Create a new Hugging Face Space with Docker SDK
|
54 |
+
2. Enable Dev Mode for VS Code access
|
55 |
+
3. Clone the GitHub repository
|
56 |
+
4. Set up environment variables for secrets
|
57 |
+
5. Configure persistent storage (20GB purchased)
|
58 |
+
|
59 |
+
### Running on Hugging Face
|
60 |
+
1. Configure Space to always stay running (persistent execution)
|
61 |
+
2. Use "Secrets" in Space settings for API keys and credentials
|
62 |
+
3. Set up scheduled tasks with GitHub Actions for:
|
63 |
+
- Processing files (daily)
|
64 |
+
- Backing up code (every 6 hours)
|
65 |
+
|
66 |
+
### Implementation Details
|
67 |
+
1. **File Storage**:
|
68 |
+
- Store input files in Hugging Face's persistent storage
|
69 |
+
- Use Hugging Face Datasets for managing processed data
|
70 |
+
|
71 |
+
2. **Process Automation**:
|
72 |
+
- For "under the hood" processing:
|
73 |
+
- Configure Space to run continuously
|
74 |
+
- Set up GitHub Actions for scheduled tasks
|
75 |
+
- Use Docker health checks to ensure service stays alive
|
76 |
+
|
77 |
+
3. **Deployment Architecture**:
|
78 |
+
- Hugging Face Space = Cloud Run equivalent
|
79 |
+
- Space will run the server continuously
|
80 |
+
- Configure autoscaling in the Dockerfile settings
|
81 |
+
|
82 |
+
### Key Files
|
83 |
+
- `auto_process_bucket.py`: Batch file processor
|
84 |
+
- `process_text.py`: Individual file processor
|
85 |
+
- `rag_query.py`: Query interface
|
86 |
+
- `app.py`: Web application
|
87 |
+
- `auto_backup.sh`: GitHub backup script
|
88 |
+
- `setup_all.sh`: Complete setup script
|
89 |
+
|
90 |
+
### Required Environment Variables
|
91 |
+
- `GOOGLE_APPLICATION_CREDENTIALS`: Google Cloud credentials
|
92 |
+
- `PROJECT_ID`: Google Cloud project ID
|
93 |
+
- `BUCKET_NAME`: GCS bucket name
|
94 |
+
- `GITHUB_TOKEN`: For GitHub access
|
95 |
+
- `HF_TOKEN`: For Hugging Face API access
|
96 |
+
|
97 |
+
### Hugging Face Specific Updates
|
98 |
+
- Update Dockerfile for Hugging Face compatibility
|
99 |
+
- Create Space UI in `app.py` using Gradio or Streamlit
|
100 |
+
- Use Hugging Face Datasets API in addition to BigQuery
|
101 |
+
|
102 |
+
## Project Goal
|
103 |
+
Create an automated RAG system that:
|
104 |
+
1. Automatically processes text/PDF files
|
105 |
+
2. Runs continuously "under the hood"
|
106 |
+
3. Provides a simple query interface
|
107 |
+
4. Backs up all code and data
|
108 |
+
5. Requires minimal maintenance
|