ImageDataExtractor3 / README2.md
WebashalarForML's picture
Update README2.md
c5ca5ae verified
|
raw
history blame
No virus
5.21 kB
<p style="font-size:70px; font-weight:bold; text-align:center;">
Image Data Extractor
</p>
<hr>
# Overview:
The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system falls back to the **Gliner urchade/gliner_mediumv2.1** model.
Both **Mistral 7B** and **Gliner urchade/gliner_mediumv2.1** models are used under the **Apache 2.0 license**.
---
# Installation Guide:
1. **Create and Activate a Virtual Environment**
```bash
python -m venv venv
source venv/bin/activate # For Linux/Mac
# or
venv\Scripts\activate # For Windows
```
2. **Install Required Libraries**
```bash
pip install -r requirements.txt
```
3. **Run the Application**
- If Docker is being used:
```bash
docker-compose up --build
```
- Without Docker:
```bash
python app.py
```
4. **Set up Hugging Face Token**
- Add your Hugging Face token in the `.env` file:
```bash
HF_TOKEN=<your_huggingface_token>
```
---
# File Structure Overview:
```
ImageDataExtractor/
β”‚
β”œβ”€β”€ app.py # Main Flask app
β”œβ”€β”€ requirements.txt # Dependencies
β”œβ”€β”€ Dockerfile # Docker container setup
β”œβ”€β”€ docker-compose.yml # Docker Compose setup
β”‚
β”œβ”€β”€ utility/
β”‚ └── utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing
β”‚
β”œβ”€β”€ template/
β”‚ β”œβ”€β”€ index.html # UI for image uploads
β”‚ └── result.html # Display extracted results
β”‚
β”œβ”€β”€ Backup/
β”‚ β”œβ”€β”€ modules/ # Base classes for data processing models
β”‚ β”‚ └── base.py
β”‚ β”‚ └── data_proc.py
β”‚ β”‚ └── evaluator.py
β”‚ β”‚ └── layers.py
β”‚ β”‚ └── run_evaluation.py
β”‚ β”‚ └── span_rep.py
β”‚ β”‚ └── token_rep.py
β”‚ β”œβ”€β”€ backup.py # Backup handling Gliner Model integration and backup logic
β”‚ └── model.py
β”‚ └── save_load.py
β”‚ └── train.py
β”‚
└── .env # Environment variables (includes Hugging Face token)
```
---
# Program Overview:
### PaddleOCR Integration (utility/utils.py):
- **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
- **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR.
### Mistral 7B Integration (utility/utils.py):
- **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
### Fallback Mechanism (Backup/backup.py):
- **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service.
- **Error Handling**: Manages failures in model availability and ensures smooth fallback.
### Web Interface (app.py):
- **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner.
- **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results.
---
# Tree Map of the Program:
```
app.py
β”œβ”€β”€ Handles Flask API and web interface
β”œβ”€β”€ Manages file upload
β”œβ”€β”€ Extracts text with PaddleOCR
β”œβ”€β”€ Processes text with Mistral 7B
└── Displays structured results
utility/utils.py
β”œβ”€β”€ PaddleOCR for text extraction
└── Mistral 7B for data structuring
Backup/backup.py
β”œβ”€β”€ Gliner urchade/gliner_mediumv2.1 as fallback
└── Backup and error handling
```
---
# Licensing:
- **Mistral 7B model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
- **Gliner urchade/gliner_mediumv2.1 model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
---
# Main Task:
The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:
- **Name**
- **Designation**
- **Phone Number**
- **Address**
- **Company Name**
---
# References:
- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
- [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md)
- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md)
- [Flask Documentation](https://flask.palletsprojects.com/)
- [Docker Documentation](https://docs.docker.com/)
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)
---