File size: 5,942 Bytes
88ed72f
4743652
 
 
 
88ed72f
4743652
 
 
 
 
 
 
 
 
88ed72f
 
4743652
 
 
 
 
 
 
 
 
 
 
 
 
 
88ed72f
4fafe25
4743652
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2415b80
4743652
2415b80
4743652
2415b80
4743652
 
 
 
2415b80
4743652
b0b9ff6
 
2415b80
4743652
 
 
2415b80
4743652
 
2415b80
4743652
 
 
2415b80
4743652
 
2415b80
4743652
2415b80
4743652
2415b80
 
4743652
 
 
 
 
 
 
2415b80
4743652
2415b80
 
 
2384a40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4fafe25
 
4743652
4fafe25
 
 
 
4743652
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
language:
- en
license: mit
library_name: transformers
tags:
- image-to-text
- license-plate-recognition
- ocr
- transformers
datasets:
- PawanKrGunjan/license_plates
metrics:
- cer
base_model: microsoft/trocr-base-handwritten
model-index:
- name: license_plate_recognizer
  results:
  - task:
      type: image-to-text
      name: License Plate Recognition
    dataset:
      type: PawanKrGunjan/license_plates
      name: License Plates Dataset
      config: default
      split: validation
    metrics:
    - type: cer
      value: 0.0036
      name: Character Error Rate (CER)
pipeline_tag: image-to-text
---

# License Plate Recognizer

This model is a fine-tuned version of the [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model, specifically designed for recognizing and extracting text from license plate images. It was trained on the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset and is ideal for OCR tasks focused on license plates.

## Model Description

### TrOCR (base-sized model, fine-tuned on IAM)

This model is based on the TrOCR model, which was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. The original TrOCR model was first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr).

The TrOCR model utilizes an encoder-decoder architecture:
- **Encoder:** A Transformer-based image encoder initialized from BEiT weights.
- **Decoder:** A Transformer-based text decoder initialized from RoBERTa weights.

The model processes images as sequences of patches (16x16 resolution) and generates text autoregressively.

This version of TrOCR has been fine-tuned on the IAM dataset for improved performance in OCR tasks involving handwritten text, making it particularly effective for recognizing text in license plates.

### Fine-Tuning Details

- **Base Model:** The model was fine-tuned from .[microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model.
- **Dataset:** Fine-tuning was performed using the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset.

## Intended Uses & Limitations

### Use Cases

- **License Plate Recognition:** Extract text from license plate images for use in various automated systems.
- **Automated Surveillance:** Suitable for integration into automated surveillance systems for real-time monitoring.

### Limitations

- **Environmental Constraints:** Performance may degrade in low-light conditions or with low-resolution images.
- **Regional Variability:** The model may struggle with license plate designs that differ significantly from those in the training dataset.

# How to Use

Here’s an example of how to use the model in Python:

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load the processor and model
processor = TrOCRProcessor.from_pretrained("PawanKrGunjan/license_plate_recognizer")
model = VisionEncoderDecoderModel.from_pretrained("PawanKrGunjan/license_plate_recognizer")

# Load an image of a license plate
image_url = "https://datasets-server.huggingface.co/assets/PawanKrGunjan/license_plates/--/c1a289cb616808b2a834fae90d9625c2f78b82c9/--/default/train/34/image/image.jpg?Expires=1723689029&Signature=jlu~8q7l2MT2IhbS5UttYLkPaMX3416a9CByGBa9M5QKNqi9ezSTYLkDsliKKgO2c-TbiJ8LsEAOB8jmcXwQkN6eNBjrJpnyGqBZ7T99P-cXk5XwHiJa27bn6jINvBUBVID8ganhqBv-DubyyM4RcksxyjZNAE7yatBTBbaDk1-mno5pbL7fpFb~gHfMvMGalPWa-vO3teeoS0yHhp5yNzSjObmwzqn42bZpCFA3dleRPnzikyKPR3OzFK1BaPyr2bzJsLUlg3H7E8c3NGz~ryLjBREa2KpyM2X0JkhzvT0fEGsdaiyN36Tkqoi2aeH~KM8YzztD7W-jSH83dckdxw__&Key-Pair-Id=K3EI6M078Z3AC3"
image = Image.open(requests.get(image_url, stream=True).raw)

# Process the image
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate the text prediction
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)
```

# Training procedure

## Training hyperparameters

The following hyperparameters were used during training:
- **learning_rate**: 2e-05
- **train_batch_size**: 8
- **eval_batch_size**: 8
- **seed**: 42
- **optimizer**: Adam with betas=(0.9,0.999) and epsilon=1e-08
- **lr_scheduler_type**: linear
- **num_epochs**: 20

## Training results

| Training Loss | Epoch | Step | Validation Loss | Cer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 0.1379        | 1.0   | 397  | 0.0408          | 0.0124 |
| 0.0817        | 2.0   | 794  | 0.0313          | 0.0093 |
| 0.0641        | 3.0   | 1191 | 0.0253          | 0.0089 |
| 0.0431        | 4.0   | 1588 | 0.0221          | 0.0089 |
| 0.0246        | 5.0   | 1985 | 0.0233          | 0.0067 |
| 0.0192        | 6.0   | 2382 | 0.0193          | 0.0053 |
| 0.0205        | 7.0   | 2779 | 0.0221          | 0.0062 |
| 0.0158        | 8.0   | 3176 | 0.0134          | 0.0062 |
| 0.0074        | 9.0   | 3573 | 0.0086          | 0.0040 |
| 0.0074        | 10.0  | 3970 | 0.0056          | 0.0027 |
| 0.0036        | 11.0  | 4367 | 0.0033          | 0.0018 |
| 0.0079        | 12.0  | 4764 | 0.0075          | 0.0049 |
| 0.002         | 13.0  | 5161 | 0.0039          | 0.0027 |
| 0.0004        | 14.0  | 5558 | 0.0028          | 0.0022 |
| 0.0001        | 15.0  | 5955 | 0.0039          | 0.0027 |
| 0.0001        | 16.0  | 6352 | 0.0047          | 0.0035 |
| 0.0011        | 17.0  | 6749 | 0.0041          | 0.0027 |
| 0.0001        | 18.0  | 7146 | 0.0053          | 0.0018 |
| 0.0001        | 19.0  | 7543 | 0.0047          | 0.0018 |
| 0.0001        | 20.0  | 7940 | 0.0047          | 0.0018 |


## Framework versions

- Transformers 4.42.3
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1