File size: 5,214 Bytes
3e35388
fec9bf0
c2f0029
30f438f
c5ca5ae
06f01a0
 
ea048df
 
 
d65fd5b
06f01a0
 
 
c525ead
06f01a0
 
 
 
c525ead
06f01a0
 
c525ead
06f01a0
c525ead
06f01a0
 
 
c525ead
06f01a0
c525ead
06f01a0
c525ead
06f01a0
c525ead
06f01a0
 
 
c525ead
06f01a0
c525ead
d65fd5b
06f01a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3dae3d4
 
 
 
06f01a0
 
 
d65fd5b
06f01a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d65fd5b
06f01a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d65fd5b
1419e47
 
 
93756a1
d65fd5b
06f01a0
4174e3f
06f01a0
 
 
 
 
4174e3f
d65fd5b
06f01a0
 
 
0a279bd
 
06f01a0
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
<p style="font-size:70px; font-weight:bold; text-align:center;">
Image Data Extractor
</p>
<hr>

# Overview:
The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system falls back to the **Gliner urchade/gliner_mediumv2.1** model.
Both **Mistral 7B** and **Gliner urchade/gliner_mediumv2.1** models are used under the **Apache 2.0 license**.

---
# Installation Guide:

1. **Create and Activate a Virtual Environment**
    ```bash
    python -m venv venv
    source venv/bin/activate  # For Linux/Mac
    # or
    venv\Scripts\activate  # For Windows
    ```

2. **Install Required Libraries**
    ```bash
    pip install -r requirements.txt
    ```

3. **Run the Application**
    - If Docker is being used:
    ```bash
    docker-compose up --build
    ```
    - Without Docker:
    ```bash
    python app.py
    ```

4. **Set up Hugging Face Token**
    - Add your Hugging Face token in the `.env` file:
    ```bash
    HF_TOKEN=<your_huggingface_token>
    ```
---  
# File Structure Overview:

```
ImageDataExtractor/
β”‚
β”œβ”€β”€ app.py                       # Main Flask app
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ Dockerfile                   # Docker container setup
β”œβ”€β”€ docker-compose.yml           # Docker Compose setup
β”‚    
β”œβ”€β”€ utility/     
β”‚   └── utils.py                 # PaddleOCR integration, Image preprocessing and Mistral model processing 
β”‚    
β”œβ”€β”€ template/    
β”‚   β”œβ”€β”€ index.html               # UI for image uploads
β”‚   └── result.html              # Display extracted results
β”‚    
β”œβ”€β”€ Backup/  
β”‚   β”œβ”€β”€ modules/                 # Base classes for data processing models
β”‚   β”‚   └── base.py              
β”‚   β”‚   └── data_proc.py         
β”‚   β”‚   └── evaluator.py         
β”‚   β”‚   └── layers.py            
β”‚   β”‚   └── run_evaluation.py    
β”‚   β”‚   └── span_rep.py          
β”‚   β”‚   └── token_rep.py         
β”‚   β”œβ”€β”€ backup.py                # Backup handling Gliner Model integration and backup logic
β”‚   └── model.py                 
β”‚   └── save_load.py             
β”‚   └── train.py                 
β”‚    
└── .env                         # Environment variables (includes Hugging Face token)
```
---
# Program Overview:

### PaddleOCR Integration (utility/utils.py):
- **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
- **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR.

### Mistral 7B Integration (utility/utils.py):
- **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
  
### Fallback Mechanism (Backup/backup.py):
- **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service.
- **Error Handling**: Manages failures in model availability and ensures smooth fallback.

### Web Interface (app.py):
- **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner.
- **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results.
---
# Tree Map of the Program:

```
app.py
β”œβ”€β”€ Handles Flask API and web interface
β”œβ”€β”€ Manages file upload
β”œβ”€β”€ Extracts text with PaddleOCR
β”œβ”€β”€ Processes text with Mistral 7B
└── Displays structured results

utility/utils.py
β”œβ”€β”€ PaddleOCR for text extraction
└── Mistral 7B for data structuring

Backup/backup.py
β”œβ”€β”€ Gliner urchade/gliner_mediumv2.1 as fallback
└── Backup and error handling

```
---
# Licensing:
- **Mistral 7B model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
- **Gliner urchade/gliner_mediumv2.1 model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
  
---
# Main Task:
The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:
- **Name**
- **Designation**
- **Phone Number**
- **Address**
- **Company Name**

---
# References:

- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
- [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md)
- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md)
- [Flask Documentation](https://flask.palletsprojects.com/)
- [Docker Documentation](https://docs.docker.com/)
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)
---