realsanjay commited on
Commit
1015649
Β·
verified Β·
1 Parent(s): dd85135

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Document Processor
3
+ emoji: 🐠
4
+ colorFrom: yellow
5
+ colorTo: blue
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+ # 🏦 Appian Credit Union - Smart Document Processor AI
10
+
11
+ ## 🎯 Problem Statement
12
+ Appian Credit Union receives thousands of PDF documents daily that need to be classified, verified, and organized. Our solution automates this process using AI, significantly reducing manual effort and processing time.
13
+
14
+ ## πŸ’‘ Innovation Highlights
15
+ - πŸ€– Hierarchical document classification system
16
+ - πŸ‘€ Intelligent person-document association
17
+ - πŸ“Š Automated metadata extraction
18
+ - πŸ”„ Batch processing capabilities
19
+ - 🎨 Modern, intuitive UI
20
+
21
+ ## 🎯 Document Types Supported
22
+ - πŸ’³ Bank Account Applications
23
+ - Credit Card Applications
24
+ - Savings Account Applications
25
+ - πŸͺͺ Identity Documents
26
+ - Driver's License
27
+ - State/Country ID
28
+ - Passport
29
+ - πŸ“Š Financial Documents
30
+ - Income Statements
31
+ - Paystubs
32
+ - Tax Returns
33
+ - 🧾 Receipts
34
+
35
+ ## πŸ› οΈ Technical Architecture
36
+ - **Backend Framework**: Python + Flask
37
+ - **Document Processing**: PyPDF2
38
+ - **ML/AI Pipeline**:
39
+ - TF-IDF Vectorization
40
+ - Naive Bayes Classification
41
+ - Named Entity Recognition
42
+ - **Frontend**: HTML + JavaScript + Tailwind CSS
43
+ - **Database**: SQLite
44
+ - **Deployment**: Hugging Face Spaces
45
+
46
+ ## ✨ Key Features
47
+
48
+ ### 1. Hierarchical Classification
49
+ - Person-level document association using:
50
+ - Name matching
51
+ - Government ID recognition
52
+ - Email address extraction
53
+ - Document type categorization
54
+ - Automatic grouping of similar documents
55
+
56
+ ### 2. Information Extraction
57
+ - Automated extraction of:
58
+ - Personal information
59
+ - Financial data
60
+ - Document dates
61
+ - Account numbers
62
+ - Government ID numbers
63
+
64
+ ### 3. Processing Pipeline
65
+ - Batch document upload
66
+ - Real-time processing
67
+ - Error handling and validation
68
+ - Progress tracking
69
+ - Results summary
70
+
71
+ ## πŸš€ Getting Started
72
+
73
+ ### Prerequisites
74
+ ```bash
75
+ Python 3.9+
76
+ pip
77
+ Virtual Environment (recommended)
78
+ ```
79
+
80
+ ### Installation
81
+ 1. Clone the repository
82
+ ```bash
83
+ git clone https://github.com/yourusername/appian-document-processor.git
84
+ cd appian-document-processor
85
+ ```
86
+
87
+ 2. Install dependencies
88
+ ```bash
89
+ pip install -r requirements.txt
90
+ ```
91
+
92
+ 3. Run the application
93
+ ```bash
94
+ python app.py
95
+ ```
96
+
97
+ 4. Access at `http://localhost:7860`
98
+
99
+ ## πŸ‘₯ Team Members
100
+ - Sanjay Malladi
101
+
102
+ ## πŸ“ License
103
+ MIT License
104
+
105
+ ## 🀝 Acknowledgments
106
+ - Appian AI Challenge Team
107
+ - IIT Madras
108
+ - Open Source Community
109
+
110
+ ---
111
+ *Developed for the Appian AI Challenge 2024-25 at IIT Madras*