Update README.md

b0b9ff6 verified 3 months ago

5.94 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- image-to-text
	- license-plate-recognition
	- ocr
	- transformers
	datasets:
	- PawanKrGunjan/license_plates
	metrics:
	- cer
	base_model: microsoft/trocr-base-handwritten
	model-index:
	- name: license_plate_recognizer
	results:
	- task:
	type: image-to-text
	name: License Plate Recognition
	dataset:
	type: PawanKrGunjan/license_plates
	name: License Plates Dataset
	config: default
	split: validation
	metrics:
	- type: cer
	value: 0.0036
	name: Character Error Rate (CER)
	pipeline_tag: image-to-text
	---

	# License Plate Recognizer

	This model is a fine-tuned version of the [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model, specifically designed for recognizing and extracting text from license plate images. It was trained on the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset and is ideal for OCR tasks focused on license plates.

	## Model Description

	### TrOCR (base-sized model, fine-tuned on IAM)

	This model is based on the TrOCR model, which was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. The original TrOCR model was first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr).

	The TrOCR model utilizes an encoder-decoder architecture:
	- Encoder: A Transformer-based image encoder initialized from BEiT weights.
	- Decoder: A Transformer-based text decoder initialized from RoBERTa weights.

	The model processes images as sequences of patches (16x16 resolution) and generates text autoregressively.

	This version of TrOCR has been fine-tuned on the IAM dataset for improved performance in OCR tasks involving handwritten text, making it particularly effective for recognizing text in license plates.

	### Fine-Tuning Details

	- Base Model: The model was fine-tuned from .[microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) model.
	- Dataset: Fine-tuning was performed using the [PawanKrGunjan/license_plates](https://huggingface.co/datasets/PawanKrGunjan/license_plates) dataset.

	## Intended Uses & Limitations

	### Use Cases

	- License Plate Recognition: Extract text from license plate images for use in various automated systems.
	- Automated Surveillance: Suitable for integration into automated surveillance systems for real-time monitoring.

	### Limitations

	- Environmental Constraints: Performance may degrade in low-light conditions or with low-resolution images.
	- Regional Variability: The model may struggle with license plate designs that differ significantly from those in the training dataset.

	# How to Use

	Here’s an example of how to use the model in Python:

	```python
	from transformers import TrOCRProcessor, VisionEncoderDecoderModel
	from PIL import Image
	import requests

	# Load the processor and model
	processor = TrOCRProcessor.from_pretrained("PawanKrGunjan/license_plate_recognizer")
	model = VisionEncoderDecoderModel.from_pretrained("PawanKrGunjan/license_plate_recognizer")

	# Load an image of a license plate
	image_url = "https://datasets-server.huggingface.co/assets/PawanKrGunjan/license_plates/--/c1a289cb616808b2a834fae90d9625c2f78b82c9/--/default/train/34/image/image.jpg?Expires=1723689029&Signature=jlu~8q7l2MT2IhbS5UttYLkPaMX3416a9CByGBa9M5QKNqi9ezSTYLkDsliKKgO2c-TbiJ8LsEAOB8jmcXwQkN6eNBjrJpnyGqBZ7T99P-cXk5XwHiJa27bn6jINvBUBVID8ganhqBv-DubyyM4RcksxyjZNAE7yatBTBbaDk1-mno5pbL7fpFb~gHfMvMGalPWa-vO3teeoS0yHhp5yNzSjObmwzqn42bZpCFA3dleRPnzikyKPR3OzFK1BaPyr2bzJsLUlg3H7E8c3NGz~ryLjBREa2KpyM2X0JkhzvT0fEGsdaiyN36Tkqoi2aeH~KM8YzztD7W-jSH83dckdxw__&Key-Pair-Id=K3EI6M078Z3AC3"
	image = Image.open(requests.get(image_url, stream=True).raw)

	# Process the image
	pixel_values = processor(image, return_tensors="pt").pixel_values

	# Generate the text prediction
	generated_ids = model.generate(pixel_values)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(generated_text)
	```

	# Training procedure

	## Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	## Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Cer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.1379 \| 1.0 \| 397 \| 0.0408 \| 0.0124 \|
	\| 0.0817 \| 2.0 \| 794 \| 0.0313 \| 0.0093 \|
	\| 0.0641 \| 3.0 \| 1191 \| 0.0253 \| 0.0089 \|
	\| 0.0431 \| 4.0 \| 1588 \| 0.0221 \| 0.0089 \|
	\| 0.0246 \| 5.0 \| 1985 \| 0.0233 \| 0.0067 \|
	\| 0.0192 \| 6.0 \| 2382 \| 0.0193 \| 0.0053 \|
	\| 0.0205 \| 7.0 \| 2779 \| 0.0221 \| 0.0062 \|
	\| 0.0158 \| 8.0 \| 3176 \| 0.0134 \| 0.0062 \|
	\| 0.0074 \| 9.0 \| 3573 \| 0.0086 \| 0.0040 \|
	\| 0.0074 \| 10.0 \| 3970 \| 0.0056 \| 0.0027 \|
	\| 0.0036 \| 11.0 \| 4367 \| 0.0033 \| 0.0018 \|
	\| 0.0079 \| 12.0 \| 4764 \| 0.0075 \| 0.0049 \|
	\| 0.002 \| 13.0 \| 5161 \| 0.0039 \| 0.0027 \|
	\| 0.0004 \| 14.0 \| 5558 \| 0.0028 \| 0.0022 \|
	\| 0.0001 \| 15.0 \| 5955 \| 0.0039 \| 0.0027 \|
	\| 0.0001 \| 16.0 \| 6352 \| 0.0047 \| 0.0035 \|
	\| 0.0011 \| 17.0 \| 6749 \| 0.0041 \| 0.0027 \|
	\| 0.0001 \| 18.0 \| 7146 \| 0.0053 \| 0.0018 \|
	\| 0.0001 \| 19.0 \| 7543 \| 0.0047 \| 0.0018 \|
	\| 0.0001 \| 20.0 \| 7940 \| 0.0047 \| 0.0018 \|


	## Framework versions

	- Transformers 4.42.3
	- Pytorch 2.1.2
	- Datasets 2.20.0
	- Tokenizers 0.19.1