Spaces:
Running
title: OCR IITRoorkie
emoji: π
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
OCR and Keyword Search Web Application
This web application performs Optical Character Recognition (OCR) on uploaded images containing text in both Hindi and English, and provides a keyword search functionality.
Setup
Install the required dependencies: pip install -r requirements.txt This contains crucial libraries like transformers, gradio, pillow, tesseract, pytesseract
Install Tesseract OCR: For Windows, Download and install from https://github.com/UB-Mannheim/tesseract/wiki
Update the tesseract path in script (this was not needed while deploying to Hugging Face Space but had to use it while running it locally on my machine)
Running Locally
To run the application locally: python app.py
Deployment
To deploy on Hugging Face Spaces:
- Created a new Space on Hugging Face.
- While creating space, I set the Space SDK to Gradio
- Upload the
app.py
file and createdrequirements.txt
andpackages.txt
for libraries and packages respectively
Usage
- Upload an image containing Hindi and English texts.
- Enter a keyword to search within the extracted text.
- The application will display the extracted text and search results.
Note: The OCR accuracy may vary depending on the image quality. Might get incorrect readings if the image has hazy words.