File size: 5,942 Bytes

---
language:
- en
license: mit
library_name: transformers
tags:
- image-to-text
- license-plate-recognition
- ocr
- transformers
datasets:
- PawanKrGunjan/license_plates
metrics:
- cer
base_model: microsoft/trocr-base-handwritten
model-index:
- name: license_plate_recognizer
  results:
  - task:
      type: image-to-text
      name: License Plate Recognition
    dataset:
      type: PawanKrGunjan/license_plates
      name: License Plates Dataset
      config: default
      split: validation
    metrics:
    - type: cer
      value: 0.0036
      name: Character Error Rate (CER)
pipeline_tag: image-to-text
---

# License Plate Recognizer

This model is a fine-tuned version of the [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model, specifically designed for recognizing and extracting text from license plate images. It was trained on the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset and is ideal for OCR tasks focused on license plates.

## Model Description

### TrOCR (base-sized model, fine-tuned on IAM)

This model is based on the TrOCR model, which was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. The original TrOCR model was first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr).

The TrOCR model utilizes an encoder-decoder architecture:
- **Encoder:** A Transformer-based image encoder initialized from BEiT weights.
- **Decoder:** A Transformer-based text decoder initialized from RoBERTa weights.

The model processes images as sequences of patches (16x16 resolution) and generates text autoregressively.

This version of TrOCR has been fine-tuned on the IAM dataset for improved performance in OCR tasks involving handwritten text, making it particularly effective for recognizing text in license plates.

### Fine-Tuning Details

- **Base Model:** The model was fine-tuned from .[microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model.
- **Dataset:** Fine-tuning was performed using the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset.

## Intended Uses & Limitations

### Use Cases

- **License Plate Recognition:** Extract text from license plate images for use in various automated systems.
- **Automated Surveillance:** Suitable for integration into automated surveillance systems for real-time monitoring.

### Limitations

- **Environmental Constraints:** Performance may degrade in low-light conditions or with low-resolution images.
- **Regional Variability:** The model may struggle with license plate designs that differ significantly from those in the training dataset.

# How to Use

Here’s an example of how to use the model in Python:

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load the processor and model
processor = TrOCRProcessor.from_pretrained("PawanKrGunjan/license_plate_recognizer")
model = VisionEncoderDecoderModel.from_pretrained("PawanKrGunjan/license_plate_recognizer")

# Load an image of a license plate
image_url = "https://datasets-server.huggingface.co/assets/PawanKrGunjan/license_plates/--/c1a289cb616808b2a834fae90d9625c2f78b82c9/--/default/train/34/image/image.jpg?Expires=1723689029&Signature=jlu~8q7l2MT2IhbS5UttYLkPaMX3416a9CByGBa9M5QKNqi9ezSTYLkDsliKKgO2c-TbiJ8LsEAOB8jmcXwQkN6eNBjrJpnyGqBZ7T99P-cXk5XwHiJa27bn6jINvBUBVID8ganhqBv-DubyyM4RcksxyjZNAE7yatBTBbaDk1-mno5pbL7fpFb~gHfMvMGalPWa-vO3teeoS0yHhp5yNzSjObmwzqn42bZpCFA3dleRPnzikyKPR3OzFK1BaPyr2bzJsLUlg3H7E8c3NGz~ryLjBREa2KpyM2X0JkhzvT0fEGsdaiyN36Tkqoi2aeH~KM8YzztD7W-jSH83dckdxw__&Key-Pair-Id=K3EI6M078Z3AC3"
image = Image.open(requests.get(image_url, stream=True).raw)

# Process the image
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate the text prediction
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)
```

# Training procedure

## Training hyperparameters

The following hyperparameters were used during training:
- **learning_rate**: 2e-05
- **train_batch_size**: 8
- **eval_batch_size**: 8
- **seed**: 42
- **optimizer**: Adam with betas=(0.9,0.999) and epsilon=1e-08
- **lr_scheduler_type**: linear
- **num_epochs**: 20

## Training results

| Training Loss | Epoch | Step | Validation Loss | Cer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 0.1379        | 1.0   | 397  | 0.0408          | 0.0124 |
| 0.0817        | 2.0   | 794  | 0.0313          | 0.0093 |
| 0.0641        | 3.0   | 1191 | 0.0253          | 0.0089 |
| 0.0431        | 4.0   | 1588 | 0.0221          | 0.0089 |
| 0.0246        | 5.0   | 1985 | 0.0233          | 0.0067 |
| 0.0192        | 6.0   | 2382 | 0.0193          | 0.0053 |
| 0.0205        | 7.0   | 2779 | 0.0221          | 0.0062 |
| 0.0158        | 8.0   | 3176 | 0.0134          | 0.0062 |
| 0.0074        | 9.0   | 3573 | 0.0086          | 0.0040 |
| 0.0074        | 10.0  | 3970 | 0.0056          | 0.0027 |
| 0.0036        | 11.0  | 4367 | 0.0033          | 0.0018 |
| 0.0079        | 12.0  | 4764 | 0.0075          | 0.0049 |
| 0.002         | 13.0  | 5161 | 0.0039          | 0.0027 |
| 0.0004        | 14.0  | 5558 | 0.0028          | 0.0022 |
| 0.0001        | 15.0  | 5955 | 0.0039          | 0.0027 |
| 0.0001        | 16.0  | 6352 | 0.0047          | 0.0035 |
| 0.0011        | 17.0  | 6749 | 0.0041          | 0.0027 |
| 0.0001        | 18.0  | 7146 | 0.0053          | 0.0018 |
| 0.0001        | 19.0  | 7543 | 0.0047          | 0.0018 |
| 0.0001        | 20.0  | 7940 | 0.0047          | 0.0018 |


## Framework versions

- Transformers 4.42.3
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1