---
title: OCR IITRoorkie
emoji: 📚
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


# OCR and Keyword Search Web Application

This web application performs Optical Character Recognition (OCR) on uploaded images containing text in both Hindi and English, 
and provides a keyword search functionality.

## Setup

1. Install the required dependencies:
   pip install -r requirements.txt
   This contains crucial libraries like transformers, gradio, pillow, tesseract, pytesseract

2. Install Tesseract OCR:
  For Windows,
  Download and install from https://github.com/UB-Mannheim/tesseract/wiki   

3. Update the tesseract path in script (this was not needed while deploying to Hugging Face Space but had to use it while running it locally on my machine)

## Running Locally

To run the application locally:
python app.py

## Deployment

To deploy on Hugging Face Spaces:

1. Created a new Space on Hugging Face.
2. While creating space, I set the Space SDK to Gradio
3. Upload the `app.py` file and created `requirements.txt` and `packages.txt` for libraries and packages respectively

## Usage

1. Upload an image containing Hindi and English texts.
2. Enter a keyword to search within the extracted text.
3. The application will display the extracted text and search results.

Note: The OCR accuracy may vary depending on the image quality. Might get incorrect readings if the image has hazy words.