Spaces:
Build error
Build error
Upload README.md
Browse files
README.md
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Document Processor
|
3 |
+
emoji: π
|
4 |
+
colorFrom: yellow
|
5 |
+
colorTo: blue
|
6 |
+
sdk: docker
|
7 |
+
pinned: false
|
8 |
+
---
|
9 |
+
# π¦ Appian Credit Union - Smart Document Processor AI
|
10 |
+
|
11 |
+
## π― Problem Statement
|
12 |
+
Appian Credit Union receives thousands of PDF documents daily that need to be classified, verified, and organized. Our solution automates this process using AI, significantly reducing manual effort and processing time.
|
13 |
+
|
14 |
+
## π‘ Innovation Highlights
|
15 |
+
- π€ Hierarchical document classification system
|
16 |
+
- π€ Intelligent person-document association
|
17 |
+
- π Automated metadata extraction
|
18 |
+
- π Batch processing capabilities
|
19 |
+
- π¨ Modern, intuitive UI
|
20 |
+
|
21 |
+
## π― Document Types Supported
|
22 |
+
- π³ Bank Account Applications
|
23 |
+
- Credit Card Applications
|
24 |
+
- Savings Account Applications
|
25 |
+
- πͺͺ Identity Documents
|
26 |
+
- Driver's License
|
27 |
+
- State/Country ID
|
28 |
+
- Passport
|
29 |
+
- π Financial Documents
|
30 |
+
- Income Statements
|
31 |
+
- Paystubs
|
32 |
+
- Tax Returns
|
33 |
+
- π§Ύ Receipts
|
34 |
+
|
35 |
+
## π οΈ Technical Architecture
|
36 |
+
- **Backend Framework**: Python + Flask
|
37 |
+
- **Document Processing**: PyPDF2
|
38 |
+
- **ML/AI Pipeline**:
|
39 |
+
- TF-IDF Vectorization
|
40 |
+
- Naive Bayes Classification
|
41 |
+
- Named Entity Recognition
|
42 |
+
- **Frontend**: HTML + JavaScript + Tailwind CSS
|
43 |
+
- **Database**: SQLite
|
44 |
+
- **Deployment**: Hugging Face Spaces
|
45 |
+
|
46 |
+
## β¨ Key Features
|
47 |
+
|
48 |
+
### 1. Hierarchical Classification
|
49 |
+
- Person-level document association using:
|
50 |
+
- Name matching
|
51 |
+
- Government ID recognition
|
52 |
+
- Email address extraction
|
53 |
+
- Document type categorization
|
54 |
+
- Automatic grouping of similar documents
|
55 |
+
|
56 |
+
### 2. Information Extraction
|
57 |
+
- Automated extraction of:
|
58 |
+
- Personal information
|
59 |
+
- Financial data
|
60 |
+
- Document dates
|
61 |
+
- Account numbers
|
62 |
+
- Government ID numbers
|
63 |
+
|
64 |
+
### 3. Processing Pipeline
|
65 |
+
- Batch document upload
|
66 |
+
- Real-time processing
|
67 |
+
- Error handling and validation
|
68 |
+
- Progress tracking
|
69 |
+
- Results summary
|
70 |
+
|
71 |
+
## π Getting Started
|
72 |
+
|
73 |
+
### Prerequisites
|
74 |
+
```bash
|
75 |
+
Python 3.9+
|
76 |
+
pip
|
77 |
+
Virtual Environment (recommended)
|
78 |
+
```
|
79 |
+
|
80 |
+
### Installation
|
81 |
+
1. Clone the repository
|
82 |
+
```bash
|
83 |
+
git clone https://github.com/yourusername/appian-document-processor.git
|
84 |
+
cd appian-document-processor
|
85 |
+
```
|
86 |
+
|
87 |
+
2. Install dependencies
|
88 |
+
```bash
|
89 |
+
pip install -r requirements.txt
|
90 |
+
```
|
91 |
+
|
92 |
+
3. Run the application
|
93 |
+
```bash
|
94 |
+
python app.py
|
95 |
+
```
|
96 |
+
|
97 |
+
4. Access at `http://localhost:7860`
|
98 |
+
|
99 |
+
## π₯ Team Members
|
100 |
+
- Sanjay Malladi
|
101 |
+
|
102 |
+
## π License
|
103 |
+
MIT License
|
104 |
+
|
105 |
+
## π€ Acknowledgments
|
106 |
+
- Appian AI Challenge Team
|
107 |
+
- IIT Madras
|
108 |
+
- Open Source Community
|
109 |
+
|
110 |
+
---
|
111 |
+
*Developed for the Appian AI Challenge 2024-25 at IIT Madras*
|