lithish2602/OCR_2.0_GOT_MODEL

PROJECT DESCRIPTION:

This project is a web application that performs Optical Character Recognition (OCR) For the uploaded images. It supports English and Hindi language. And a Keyword search is available and the word is highlighted.

SOFTWARE REQUIREMENTS:

Python: The program requires Python to run, ideally Python 3.8 or later.

Libraries to be installed:

pip install gradio

pip install transformers

pip install torch

pip install safetensors

pip install pillow

pip install verovio

pip install tiktoken

installation of Nvidia driver is must MODEL INFO:

This application leverages the CPU version of the GOT (General OCR Theory) model for Optical Character Recognition (OCR). The model and tokenizer are sourced from the Hugging Face model hub:

Model: ucaslcl/GOT-OCR2_0 Tokenizer: ucaslcl/GOT-OCR2_0 While the code includes provisions for running the model on a GPU, this implementation exclusively uses the CPU version due to the constraints of the Hugging Face deployment environment. As a result, processing times may be slower compared to a GPU-enabled setup. However, using the CPU ensures compatibility across a broader range of systems, particularly those without access to a GPU

USEAGE:

1.Set the code in the Google colab and download the prerequisites libraries and run the code. Then upload the image and type the search keyword and output is visible. (most preferable)

2.Clone the GitHub repository

REQUIREMENTS: torch==2.0.1 torchvision==0.15.2 transformers==4.37.2 megfile==3.1.2 tiktoken verovio opencv-python accelerate numpy==1.26.4 shutils pillow