WebashalarForML commited on
Commit
06f01a0
β€’
1 Parent(s): 49b2f52

Create README2.md

Browse files
Files changed (1) hide show
  1. README2.md +129 -0
README2.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_
2
+ _\\-------- **Image Data Extractor** -------\\_
3
+ _\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_
4
+
5
+ # Overview:
6
+ The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The tool processes the extracted text to recognize key information such as name, designation, contact number, address, and company name, organizing the output into a well-defined structure. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system seamlessly switches to the **Gliner urchade/gliner_mediumv2.1** model.
7
+
8
+ # Installation Guide:
9
+
10
+ 1. **Create and Activate a Virtual Environment**
11
+ ```bash
12
+ python -m venv venv
13
+ source venv/bin/activate # For Linux/Mac
14
+ # or
15
+ venv\Scripts\activate # For Windows
16
+ ```
17
+
18
+ 2. **Install Required Libraries**
19
+ ```bash
20
+ pip install -r requirements.txt
21
+ ```
22
+
23
+ 3. **Run the Application**
24
+ - If Docker is being used:
25
+ ```bash
26
+ docker-compose up --build
27
+ ```
28
+ - Without Docker:
29
+ ```bash
30
+ python app.py
31
+ ```
32
+
33
+ 4. **Set up Hugging Face Token**
34
+ - Add your Hugging Face token in the `.env` file:
35
+ ```bash
36
+ HF_TOKEN=<your_huggingface_token>
37
+ ```
38
+
39
+ # File Structure Overview:
40
+
41
+ ```
42
+ ImageDataExtractor/
43
+ β”‚
44
+ β”œβ”€β”€ app.py # Main Flask app
45
+ β”œβ”€β”€ requirements.txt # Dependencies
46
+ β”œβ”€β”€ Dockerfile # Docker container setup
47
+ β”œβ”€β”€ docker-compose.yml # Docker Compose setup
48
+ β”‚
49
+ β”œβ”€β”€ utility/
50
+ β”‚ └── utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing
51
+ β”‚
52
+ β”œβ”€β”€ template/
53
+ β”‚ β”œβ”€β”€ index.html # UI for image uploads
54
+ β”‚ └── result.html # Display extracted results
55
+ β”‚
56
+ β”œβ”€β”€ Backup/
57
+ β”‚ β”œβ”€β”€ modules/ # Base classes for data processing models
58
+ β”‚ β”‚ └── base.py
59
+ β”‚ β”‚ └── data_proc.py
60
+ β”‚ β”‚ └── evaluator.py
61
+ β”‚ β”‚ └── layers.py
62
+ β”‚ β”‚ └── run_evaluation.py
63
+ β”‚ β”‚ └── span_rep.py
64
+ β”‚ β”‚ └── token_rep.py
65
+ β”‚ β”œβ”€β”€ backup.py # Backup handling
66
+ β”‚ └── model.py # Gliner Model integration and backup logic
67
+ β”‚ └── save_load.py # Mistral 7B model integration and backup logic
68
+ β”‚ └── train.py # Mistral 7B model integration and backup logic
69
+ β”‚
70
+ └── .env # Environment variables (includes Hugging Face token)
71
+ ```
72
+
73
+ # Program Overview:
74
+
75
+ ### PaddleOCR Integration (utility/utils.py):
76
+ - **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
77
+ - **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR.
78
+
79
+ ### Mistral 7B Integration (utility/utils.py):
80
+ - **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
81
+
82
+ ### Fallback Mechanism (Backup/backup.py):
83
+ - **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service.
84
+ - **Error Handling**: Manages failures in model availability and ensures smooth fallback.
85
+
86
+ ### Web Interface (app.py):
87
+ - **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner.
88
+ - **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results.
89
+
90
+ # Tree Map of the Program:
91
+
92
+ ```
93
+ app.py
94
+ β”œβ”€β”€ Handles Flask API and web interface
95
+ β”œβ”€β”€ Manages file upload
96
+ β”œβ”€β”€ Extracts text with PaddleOCR
97
+ β”œβ”€β”€ Processes text with Mistral 7B
98
+ └── Displays structured results
99
+
100
+ utility/utils.py
101
+ β”œβ”€β”€ PaddleOCR for text extraction
102
+ └── Mistral 7B for data structuring
103
+
104
+ Backup/backup.py
105
+ β”œβ”€β”€ Gliner urchade/gliner_mediumv2.1 as fallback
106
+ └── Backup and error handling
107
+
108
+ Backup/model.py
109
+ └── Mistral 7B integration and processing logic
110
+ ```
111
+
112
+ # Main Task:
113
+ The main objective is to extract and structure text data from visiting cards. The system identifies and organizes:
114
+ - **Name**
115
+ - **Designation**
116
+ - **Phone Number**
117
+ - **Address**
118
+ - **Company Name**
119
+
120
+ # References:
121
+
122
+ - [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
123
+ - [Mistral 7B Documentation](https://huggingface.co/)
124
+ - [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/)
125
+ - [Flask Documentation](https://flask.palletsprojects.com/)
126
+ - [Docker Documentation](https://docs.docker.com/)
127
+ - [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)
128
+
129
+ ---