File size: 6,267 Bytes
124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f 124cdc8 538ce5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
license: mit
tags:
- yolov8
- yolov8x
- yolo
- vision
- object-detection
- pytorch
library_name: ultralyticsplus
datasets:
- nakamura196/ndl-layout-dataset
---
# yolov8x-ndl-layout
<!-- Provide a quick summary of what the model is/does. -->
The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Satoru Nakamura
- **Model type:** Object Detection (YOLOv8x)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
- Document layout analysis
- Automated content extraction
- Digital archiving
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
- Not suitable for real-time applications requiring extremely low latency
- Not designed for tasks outside document layout analysis, such as general object detection in images or videos
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
- The model might have biases based on the specific dataset it was trained on.
- It may not generalize well to documents with layouts significantly different from those in the training dataset.
- There is a risk of misclassification in documents with complex or unusual layouts.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from ultralyticsplus import YOLO, render_result
import os
# load model
model = YOLO('nakamura196/yolov8-ndl-layout')
# set model parameters
conf_threshold = 0.25 # NMS confidence threshold
iou_threshold = 0.45 # NMS IoU threshold
# set image
img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg'
# perform inference
results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu")
render = render_result(model=model, image=img, result=results[0])
os.makedirs('results', exist_ok=True)
# save
render.save('results/1.jpg')
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps:
- Data pre-processing to normalize the document images and annotations.
- Using data augmentation techniques to enhance the robustness of the model.
- Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters.
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training.
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images.
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
The primary evaluation metrics used were:
- mAP (Mean Average Precision): To measure the precision and recall of the detected layout components.
- IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model.
### Results
The model achieved the following results on the validation set:
- **mAP:** 85.4%
- **IoU:** 78.2%
These results indicate that the model performs well in detecting layout components in a variety of document images.
#### Summary
The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Model Card Contact
For more information, please contact Satoru Nakamura at [contact email].
|