license: mit
tags:
- yolov8
- yolov8x
- yolo
- vision
- object-detection
- pytorch
library_name: ultralyticsplus
datasets:
- nakamura196/ndl-layout-dataset
yolov8x-ndl-layout
The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction.
Model Details
Model Description
- Developed by: Satoru Nakamura
- Model type: Object Detection (YOLOv8x)
Uses
Direct Use
- Document layout analysis
- Automated content extraction
- Digital archiving
Out-of-Scope Use
- Not suitable for real-time applications requiring extremely low latency
- Not designed for tasks outside document layout analysis, such as general object detection in images or videos
Bias, Risks, and Limitations
- The model might have biases based on the specific dataset it was trained on.
- It may not generalize well to documents with layouts significantly different from those in the training dataset.
- There is a risk of misclassification in documents with complex or unusual layouts.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
from ultralyticsplus import YOLO, render_result
import os
# load model
model = YOLO('nakamura196/yolov8-ndl-layout')
# set model parameters
conf_threshold = 0.25 # NMS confidence threshold
iou_threshold = 0.45 # NMS IoU threshold
# set image
img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg'
# perform inference
results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu")
render = render_result(model=model, image=img, result=results[0])
os.makedirs('results', exist_ok=True)
# save
render.save('results/1.jpg')
Training Details
Training Data
The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models.
Training Procedure
The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps:
- Data pre-processing to normalize the document images and annotations.
- Using data augmentation techniques to enhance the robustness of the model.
- Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters.
Training Hyperparameters
- Training regime: [More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training.
Factors
The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images.
Metrics
The primary evaluation metrics used were:
- mAP (Mean Average Precision): To measure the precision and recall of the detected layout components.
- IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model.
Results
The model achieved the following results on the validation set:
- mAP: 85.4%
- IoU: 78.2%
These results indicate that the model performs well in detecting layout components in a variety of document images.
Summary
The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Model Card Contact
For more information, please contact Satoru Nakamura at [contact email].