Image Data Extractor


# Overview: The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system falls back to the **Gliner urchade/gliner_mediumv2.1** model. Both **Mistral 7B** and **Gliner urchade/gliner_mediumv2.1** models are used under the **Apache 2.0 license**. --- # Installation Guide: 1. **Create and Activate a Virtual Environment** ```bash python -m venv venv source venv/bin/activate # For Linux/Mac # or venv\Scripts\activate # For Windows ``` 2. **Install Required Libraries** ```bash pip install -r requirements.txt ``` 3. **Run the Application** - If Docker is being used: ```bash docker-compose up --build ``` - Without Docker: ```bash python app.py ``` 4. **Set up Hugging Face Token** - Add your Hugging Face token in the `.env` file: ```bash HF_TOKEN= ``` --- # File Structure Overview: ``` ImageDataExtractor/ │ ├── app.py # Main Flask app ├── requirements.txt # Dependencies ├── Dockerfile # Docker container setup ├── docker-compose.yml # Docker Compose setup │ ├── utility/ │ └── utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing │ ├── template/ │ ├── index.html # UI for image uploads │ └── result.html # Display extracted results │ ├── Backup/ │ ├── modules/ # Base classes for data processing models │ │ └── base.py │ │ └── data_proc.py │ │ └── evaluator.py │ │ └── layers.py │ │ └── run_evaluation.py │ │ └── span_rep.py │ │ └── token_rep.py │ ├── backup.py # Backup handling Gliner Model integration and backup logic │ └── model.py │ └── save_load.py │ └── train.py │ └── .env # Environment variables (includes Hugging Face token) ``` --- # Program Overview: ### PaddleOCR Integration (utility/utils.py): - **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards. - **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR. ### Mistral 7B Integration (utility/utils.py): - **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name. ### Fallback Mechanism (Backup/backup.py): - **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service. - **Error Handling**: Manages failures in model availability and ensures smooth fallback. ### Web Interface (app.py): - **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner. - **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results. --- # Tree Map of the Program: ``` app.py ├── Handles Flask API and web interface ├── Manages file upload ├── Extracts text with PaddleOCR ├── Processes text with Mistral 7B └── Displays structured results utility/utils.py ├── PaddleOCR for text extraction └── Mistral 7B for data structuring Backup/backup.py ├── Gliner urchade/gliner_mediumv2.1 as fallback └── Backup and error handling ``` --- # Licensing: - **Mistral 7B model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). - **Gliner urchade/gliner_mediumv2.1 model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). --- # Main Task: The primary objective is to extract and structure data from visiting cards. The system identifies and organizes: - **Name** - **Designation** - **Phone Number** - **Address** - **Company Name** --- # References: - [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR) - [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md) - [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md) - [Flask Documentation](https://flask.palletsprojects.com/) - [Docker Documentation](https://docs.docker.com/) - [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html) ---