metadata

license: mit
tags:
  - yolov8
  - yolov8x
  - yolo
  - vision
  - object-detection
  - pytorch
library_name: ultralyticsplus
datasets:
  - nakamura196/ndl-layout-dataset

yolov8x-ndl-layout

The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction.

Model Details

Model Description

Developed by: Satoru Nakamura
Model type: Object Detection (YOLOv8x)

Uses

Direct Use

Document layout analysis
Automated content extraction
Digital archiving

Out-of-Scope Use

Not suitable for real-time applications requiring extremely low latency
Not designed for tasks outside document layout analysis, such as general object detection in images or videos

Bias, Risks, and Limitations

The model might have biases based on the specific dataset it was trained on.
It may not generalize well to documents with layouts significantly different from those in the training dataset.
There is a risk of misclassification in documents with complex or unusual layouts.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from ultralyticsplus import YOLO, render_result
import os

# load model
model = YOLO('nakamura196/yolov8-ndl-layout')
  
# set model parameters
conf_threshold = 0.25  # NMS confidence threshold
iou_threshold = 0.45  # NMS IoU threshold

# set image
img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg'

# perform inference
results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu")
render = render_result(model=model, image=img, result=results[0])  

os.makedirs('results', exist_ok=True)

# save
render.save('results/1.jpg')

Training Details

Training Data

The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models.

Training Procedure

The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps:

Data pre-processing to normalize the document images and annotations.
Using data augmentation techniques to enhance the robustness of the model.
Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters.

Training Hyperparameters

Training regime: [More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training.

Factors

The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images.

Metrics

The primary evaluation metrics used were:

mAP (Mean Average Precision): To measure the precision and recall of the detected layout components.
IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model.

Results

The model achieved the following results on the validation set:

mAP: 85.4%
IoU: 78.2%

These results indicate that the model performs well in detecting layout components in a variety of document images.

Summary

The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Model Card Contact

For more information, please contact Satoru Nakamura at [contact email].