Spaces:
Sleeping
Sleeping
File size: 3,822 Bytes
5af99a1 5a2fd5e 96586f0 5a2fd5e 5af99a1 0eaf4a1 5af99a1 3284b3e 5af99a1 1247d27 08f4db4 1247d27 96586f0 0eaf4a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
title: img-read
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.23.3
app_file: app.py
pinned: false
short_description: 'Extract Hindi & English text from images and search keywords'
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Byaldi + Qwen2VL

## Overview
The **Byaldi + Qwen2VL** app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the **RAGMultiModalModel** from Byaldi and the **Qwen2VL** model for generating meaningful responses based on the extracted text.
This application also takes advantage of **ZeroGPU** to run efficiently on powerful hardware, specifically the **NVIDIA A100** GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.
## Features
- **Image Upload**: Users can upload images from which text will be extracted.
- **Text Extraction**: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
- **Keyword Search**: Allows users to search for specific keywords within the extracted text and highlights them.
- **High-Performance**: Runs on **ZeroGPU (NVIDIA A100)** for accelerated computation and efficient model execution.
- **User-Friendly Interface**: Built using Gradio for an interactive user experience.
## Technologies Used
- **Gradio**: For creating the web interface.
- **Byaldi RAGMultiModalModel**: For indexing and searching images.
- **Qwen2VL**: For generating responses based on visual and textual inputs.
- **ZeroGPU**: For efficient model inference using **NVIDIA A100**.
- **PyTorch**: For deep learning functionalities.
- **Pillow**: For image handling.
## Getting Started
### Prerequisites
- Python 3.8 or later
- Required libraries:
```bash
pip install gradio byaldi transformers torch pillow
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd <repository-directory>
2. Install the required dependencies using pip.
3. Run the application:
```bash
python app.py
### Using the App
1. **Upload an Image**: Click on the "Upload an Image" button to select and upload an image containing text.
2. **Extract Text**: Press the "Extract Text" button to process the image and extract any text found.
3. **Search Keywords**: Enter keywords in the search box and click "Search" to highlight matching keywords in the extracted text.
## Code Overview
The core functionality of the application is encapsulated in the following sections:
- **OCR and Text Extraction**:
- The `ocr_and_extract` function processes the uploaded image, extracts text, and cleans the output to remove unnecessary labels.
- **Keyword Highlighting**:
- The `search_keywords` function takes the extracted text and user-defined keywords, highlighting matches within the text for better visibility.
## ZeroGPU Integration
The application is powered by **ZeroGPU**, leveraging the **NVIDIA A100** GPU. This ensures:
- Faster image processing and text extraction.
- Seamless handling of large-scale models like Qwen2VL.
- Optimal performance during high computational loads.
## Error Handling
The application includes basic error handling to capture and display any issues encountered during image processing. Errors will be printed to the console, and a user-friendly message will be displayed in the interface.
## References
- [Byaldi](https://huggingface.co/vidore/colpali) for providing the RAGMultiModalModel.
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) for state-of-the-art models.
- [ZeroGPU](https://www.zerogpu.com) for enabling efficient GPU computation with NVIDIA A100. |