|
--- |
|
language: |
|
- en |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- image-to-text |
|
- license-plate-recognition |
|
- ocr |
|
- transformers |
|
datasets: |
|
- PawanKrGunjan/license_plates |
|
metrics: |
|
- cer |
|
base_model: microsoft/trocr-base-handwritten |
|
model-index: |
|
- name: license_plate_recognizer |
|
results: |
|
- task: |
|
type: image-to-text |
|
name: License Plate Recognition |
|
dataset: |
|
type: PawanKrGunjan/license_plates |
|
name: License Plates Dataset |
|
config: default |
|
split: validation |
|
metrics: |
|
- type: cer |
|
value: 0.0036 |
|
name: Character Error Rate (CER) |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
# License Plate Recognizer |
|
|
|
This model is a fine-tuned version of the [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model, specifically designed for recognizing and extracting text from license plate images. It was trained on the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset and is ideal for OCR tasks focused on license plates. |
|
|
|
## Model Description |
|
|
|
### TrOCR (base-sized model, fine-tuned on IAM) |
|
|
|
This model is based on the TrOCR model, which was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. The original TrOCR model was first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr). |
|
|
|
The TrOCR model utilizes an encoder-decoder architecture: |
|
- **Encoder:** A Transformer-based image encoder initialized from BEiT weights. |
|
- **Decoder:** A Transformer-based text decoder initialized from RoBERTa weights. |
|
|
|
The model processes images as sequences of patches (16x16 resolution) and generates text autoregressively. |
|
|
|
This version of TrOCR has been fine-tuned on the IAM dataset for improved performance in OCR tasks involving handwritten text, making it particularly effective for recognizing text in license plates. |
|
|
|
### Fine-Tuning Details |
|
|
|
- **Base Model:** The model was fine-tuned from .[microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model. |
|
- **Dataset:** Fine-tuning was performed using the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset. |
|
|
|
## Intended Uses & Limitations |
|
|
|
### Use Cases |
|
|
|
- **License Plate Recognition:** Extract text from license plate images for use in various automated systems. |
|
- **Automated Surveillance:** Suitable for integration into automated surveillance systems for real-time monitoring. |
|
|
|
### Limitations |
|
|
|
- **Environmental Constraints:** Performance may degrade in low-light conditions or with low-resolution images. |
|
- **Regional Variability:** The model may struggle with license plate designs that differ significantly from those in the training dataset. |
|
|
|
# How to Use |
|
|
|
Here’s an example of how to use the model in Python: |
|
|
|
```python |
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
from PIL import Image |
|
import requests |
|
|
|
# Load the processor and model |
|
processor = TrOCRProcessor.from_pretrained("PawanKrGunjan/license_plate_recognizer") |
|
model = VisionEncoderDecoderModel.from_pretrained("PawanKrGunjan/license_plate_recognizer") |
|
|
|
# Load an image of a license plate |
|
image_url = "https://datasets-server.huggingface.co/assets/PawanKrGunjan/license_plates/--/c1a289cb616808b2a834fae90d9625c2f78b82c9/--/default/train/34/image/image.jpg?Expires=1723689029&Signature=jlu~8q7l2MT2IhbS5UttYLkPaMX3416a9CByGBa9M5QKNqi9ezSTYLkDsliKKgO2c-TbiJ8LsEAOB8jmcXwQkN6eNBjrJpnyGqBZ7T99P-cXk5XwHiJa27bn6jINvBUBVID8ganhqBv-DubyyM4RcksxyjZNAE7yatBTBbaDk1-mno5pbL7fpFb~gHfMvMGalPWa-vO3teeoS0yHhp5yNzSjObmwzqn42bZpCFA3dleRPnzikyKPR3OzFK1BaPyr2bzJsLUlg3H7E8c3NGz~ryLjBREa2KpyM2X0JkhzvT0fEGsdaiyN36Tkqoi2aeH~KM8YzztD7W-jSH83dckdxw__&Key-Pair-Id=K3EI6M078Z3AC3" |
|
image = Image.open(requests.get(image_url, stream=True).raw) |
|
|
|
# Process the image |
|
pixel_values = processor(image, return_tensors="pt").pixel_values |
|
|
|
# Generate the text prediction |
|
generated_ids = model.generate(pixel_values) |
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
print(generated_text) |
|
``` |
|
|
|
# Training procedure |
|
|
|
## Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- **learning_rate**: 2e-05 |
|
- **train_batch_size**: 8 |
|
- **eval_batch_size**: 8 |
|
- **seed**: 42 |
|
- **optimizer**: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- **lr_scheduler_type**: linear |
|
- **num_epochs**: 20 |
|
|
|
## Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Cer | |
|
|:-------------:|:-----:|:----:|:---------------:|:------:| |
|
| 0.1379 | 1.0 | 397 | 0.0408 | 0.0124 | |
|
| 0.0817 | 2.0 | 794 | 0.0313 | 0.0093 | |
|
| 0.0641 | 3.0 | 1191 | 0.0253 | 0.0089 | |
|
| 0.0431 | 4.0 | 1588 | 0.0221 | 0.0089 | |
|
| 0.0246 | 5.0 | 1985 | 0.0233 | 0.0067 | |
|
| 0.0192 | 6.0 | 2382 | 0.0193 | 0.0053 | |
|
| 0.0205 | 7.0 | 2779 | 0.0221 | 0.0062 | |
|
| 0.0158 | 8.0 | 3176 | 0.0134 | 0.0062 | |
|
| 0.0074 | 9.0 | 3573 | 0.0086 | 0.0040 | |
|
| 0.0074 | 10.0 | 3970 | 0.0056 | 0.0027 | |
|
| 0.0036 | 11.0 | 4367 | 0.0033 | 0.0018 | |
|
| 0.0079 | 12.0 | 4764 | 0.0075 | 0.0049 | |
|
| 0.002 | 13.0 | 5161 | 0.0039 | 0.0027 | |
|
| 0.0004 | 14.0 | 5558 | 0.0028 | 0.0022 | |
|
| 0.0001 | 15.0 | 5955 | 0.0039 | 0.0027 | |
|
| 0.0001 | 16.0 | 6352 | 0.0047 | 0.0035 | |
|
| 0.0011 | 17.0 | 6749 | 0.0041 | 0.0027 | |
|
| 0.0001 | 18.0 | 7146 | 0.0053 | 0.0018 | |
|
| 0.0001 | 19.0 | 7543 | 0.0047 | 0.0018 | |
|
| 0.0001 | 20.0 | 7940 | 0.0047 | 0.0018 | |
|
|
|
|
|
## Framework versions |
|
|
|
- Transformers 4.42.3 |
|
- Pytorch 2.1.2 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |