PawanKrGunjan's picture
Update README.md
b0b9ff6 verified
metadata
language:
  - en
license: mit
library_name: transformers
tags:
  - image-to-text
  - license-plate-recognition
  - ocr
  - transformers
datasets:
  - PawanKrGunjan/license_plates
metrics:
  - cer
base_model: microsoft/trocr-base-handwritten
model-index:
  - name: license_plate_recognizer
    results:
      - task:
          type: image-to-text
          name: License Plate Recognition
        dataset:
          type: PawanKrGunjan/license_plates
          name: License Plates Dataset
          config: default
          split: validation
        metrics:
          - type: cer
            value: 0.0036
            name: Character Error Rate (CER)
pipeline_tag: image-to-text

License Plate Recognizer

This model is a fine-tuned version of the microsoft/trocr-base-handwritten model, specifically designed for recognizing and extracting text from license plate images. It was trained on the PawanKrGunjan/license_plates dataset and is ideal for OCR tasks focused on license plates.

Model Description

TrOCR (base-sized model, fine-tuned on IAM)

This model is based on the TrOCR model, which was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. The original TrOCR model was first released in this repository.

The TrOCR model utilizes an encoder-decoder architecture:

  • Encoder: A Transformer-based image encoder initialized from BEiT weights.
  • Decoder: A Transformer-based text decoder initialized from RoBERTa weights.

The model processes images as sequences of patches (16x16 resolution) and generates text autoregressively.

This version of TrOCR has been fine-tuned on the IAM dataset for improved performance in OCR tasks involving handwritten text, making it particularly effective for recognizing text in license plates.

Fine-Tuning Details

Intended Uses & Limitations

Use Cases

  • License Plate Recognition: Extract text from license plate images for use in various automated systems.
  • Automated Surveillance: Suitable for integration into automated surveillance systems for real-time monitoring.

Limitations

  • Environmental Constraints: Performance may degrade in low-light conditions or with low-resolution images.
  • Regional Variability: The model may struggle with license plate designs that differ significantly from those in the training dataset.

How to Use

Here’s an example of how to use the model in Python:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load the processor and model
processor = TrOCRProcessor.from_pretrained("PawanKrGunjan/license_plate_recognizer")
model = VisionEncoderDecoderModel.from_pretrained("PawanKrGunjan/license_plate_recognizer")

# Load an image of a license plate
image_url = "https://datasets-server.huggingface.co/assets/PawanKrGunjan/license_plates/--/c1a289cb616808b2a834fae90d9625c2f78b82c9/--/default/train/34/image/image.jpg?Expires=1723689029&Signature=jlu~8q7l2MT2IhbS5UttYLkPaMX3416a9CByGBa9M5QKNqi9ezSTYLkDsliKKgO2c-TbiJ8LsEAOB8jmcXwQkN6eNBjrJpnyGqBZ7T99P-cXk5XwHiJa27bn6jINvBUBVID8ganhqBv-DubyyM4RcksxyjZNAE7yatBTBbaDk1-mno5pbL7fpFb~gHfMvMGalPWa-vO3teeoS0yHhp5yNzSjObmwzqn42bZpCFA3dleRPnzikyKPR3OzFK1BaPyr2bzJsLUlg3H7E8c3NGz~ryLjBREa2KpyM2X0JkhzvT0fEGsdaiyN36Tkqoi2aeH~KM8YzztD7W-jSH83dckdxw__&Key-Pair-Id=K3EI6M078Z3AC3"
image = Image.open(requests.get(image_url, stream=True).raw)

# Process the image
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate the text prediction
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Cer
0.1379 1.0 397 0.0408 0.0124
0.0817 2.0 794 0.0313 0.0093
0.0641 3.0 1191 0.0253 0.0089
0.0431 4.0 1588 0.0221 0.0089
0.0246 5.0 1985 0.0233 0.0067
0.0192 6.0 2382 0.0193 0.0053
0.0205 7.0 2779 0.0221 0.0062
0.0158 8.0 3176 0.0134 0.0062
0.0074 9.0 3573 0.0086 0.0040
0.0074 10.0 3970 0.0056 0.0027
0.0036 11.0 4367 0.0033 0.0018
0.0079 12.0 4764 0.0075 0.0049
0.002 13.0 5161 0.0039 0.0027
0.0004 14.0 5558 0.0028 0.0022
0.0001 15.0 5955 0.0039 0.0027
0.0001 16.0 6352 0.0047 0.0035
0.0011 17.0 6749 0.0041 0.0027
0.0001 18.0 7146 0.0053 0.0018
0.0001 19.0 7543 0.0047 0.0018
0.0001 20.0 7940 0.0047 0.0018

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1