Srisurya Teja
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Web-Based Text Extraction and Retrieval System
|
2 |
+
|
3 |
+
This project is a web application that performs Optical Character Recognition (OCR) on images and highlights keywords within the extracted text. The system supports both English and Hindi languages, allowing users to upload images, extract text, and search for specific keywords within the extracted content.
|
4 |
+
|
5 |
+
## Features
|
6 |
+
- **Language Support**: English and Hindi
|
7 |
+
- **OCR**: Extracts text from uploaded images.
|
8 |
+
- **Keyword Search**: Highlights specified keywords in the extracted text.
|
9 |
+
- **Multiple Image Formats**: Supports PNG, JPG, and JPEG image formats.
|
10 |
+
|
11 |
+
## Tech Stack
|
12 |
+
- **Python**
|
13 |
+
- **Streamlit**: Web interface for interactive image upload and keyword search.
|
14 |
+
- **Hugging Face Transformers**: Used for text extraction in English.
|
15 |
+
- **EasyOCR**: For Hindi text extraction from images.
|
16 |
+
- **PIL**: To handle image uploads.
|
17 |
+
- **Torch**: For working with the model and tokenizers.
|
18 |
+
- **Numpy**: For image processing.
|
19 |
+
|
20 |
+
## How it Works
|
21 |
+
### English OCR Flow:
|
22 |
+
1. Upload an image containing text.
|
23 |
+
2. The application uses a Hugging Face pre-trained model to extract text.
|
24 |
+
3. The extracted text is displayed, and users can search for keywords.
|
25 |
+
4. The keywords are highlighted within the extracted text.
|
26 |
+
|
27 |
+
### Hindi OCR Flow:
|
28 |
+
1. Upload an image with Hindi text.
|
29 |
+
2. EasyOCR is used to detect and extract Hindi text from the image.
|
30 |
+
3. Users can search for Hindi keywords, which will be highlighted in the extracted content.
|
31 |
+
|
32 |
+
## Installation
|
33 |
+
|
34 |
+
1. **Clone the Repository**:
|
35 |
+
```bash
|
36 |
+
git clone <https://github.com/SrisuryaTeja/Web-Based-Text-Extraction-and-Retrieval-System>
|
37 |
+
```
|
38 |
+
|
39 |
+
2. **Create and Activate a Virtual Environment**:
|
40 |
+
```bash
|
41 |
+
python -m venv myenv
|
42 |
+
source myenv/bin/activate # On Windows use myenv\Scripts\activate
|
43 |
+
```
|
44 |
+
|
45 |
+
3. **Install Dependencies**:
|
46 |
+
Install the required packages listed in the `requirements.txt` file:
|
47 |
+
```bash
|
48 |
+
pip install -r requirements.txt
|
49 |
+
```
|
50 |
+
|
51 |
+
4. **Run the Application**:
|
52 |
+
```bash
|
53 |
+
streamlit run app.py
|
54 |
+
```
|