nakamura196
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,21 +1,22 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
-
-
|
|
|
5 |
- yolo
|
6 |
- vision
|
7 |
- object-detection
|
8 |
- pytorch
|
9 |
-
library_name:
|
10 |
datasets:
|
11 |
- nakamura196/ndl-layout-dataset
|
12 |
---
|
13 |
|
14 |
-
#
|
15 |
|
16 |
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
|
18 |
-
|
19 |
|
20 |
## Model Details
|
21 |
|
@@ -23,13 +24,8 @@ This modelcard aims to be a base template for new models. It has been generated
|
|
23 |
|
24 |
<!-- Provide a longer summary of what this model is. -->
|
25 |
|
26 |
-
- **Developed by:**
|
27 |
-
- **
|
28 |
-
- **Shared by [optional]:** [More Information Needed]
|
29 |
-
- **Model type:** [More Information Needed]
|
30 |
-
- **Language(s) (NLP):** [More Information Needed]
|
31 |
-
- **License:** [More Information Needed]
|
32 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
33 |
|
34 |
## Uses
|
35 |
|
@@ -39,19 +35,24 @@ This modelcard aims to be a base template for new models. It has been generated
|
|
39 |
|
40 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
41 |
|
42 |
-
|
|
|
|
|
43 |
|
44 |
### Out-of-Scope Use
|
45 |
|
46 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
47 |
|
48 |
-
|
|
|
49 |
|
50 |
## Bias, Risks, and Limitations
|
51 |
|
52 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
53 |
|
54 |
-
|
|
|
|
|
55 |
|
56 |
### Recommendations
|
57 |
|
@@ -63,7 +64,29 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
63 |
|
64 |
Use the code below to get started with the model.
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
## Training Details
|
69 |
|
@@ -71,12 +94,18 @@ Use the code below to get started with the model.
|
|
71 |
|
72 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
73 |
|
74 |
-
|
75 |
|
76 |
### Training Procedure
|
77 |
|
78 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
#### Training Hyperparameters
|
81 |
|
82 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
@@ -91,26 +120,36 @@ Use the code below to get started with the model.
|
|
91 |
|
92 |
<!-- This should link to a Dataset Card if possible. -->
|
93 |
|
94 |
-
|
95 |
|
96 |
#### Factors
|
97 |
|
98 |
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
99 |
|
100 |
-
|
101 |
|
102 |
#### Metrics
|
103 |
|
104 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
105 |
|
106 |
-
|
|
|
|
|
|
|
107 |
|
108 |
### Results
|
109 |
|
110 |
-
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
#### Summary
|
113 |
|
|
|
|
|
114 |
## Environmental Impact
|
115 |
|
116 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
@@ -125,4 +164,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
125 |
|
126 |
## Model Card Contact
|
127 |
|
128 |
-
[
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
+
- yolov8
|
5 |
+
- yolov8x
|
6 |
- yolo
|
7 |
- vision
|
8 |
- object-detection
|
9 |
- pytorch
|
10 |
+
library_name: ultralyticsplus
|
11 |
datasets:
|
12 |
- nakamura196/ndl-layout-dataset
|
13 |
---
|
14 |
|
15 |
+
# yolov8x-ndl-layout
|
16 |
|
17 |
<!-- Provide a quick summary of what the model is/does. -->
|
18 |
|
19 |
+
The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction.
|
20 |
|
21 |
## Model Details
|
22 |
|
|
|
24 |
|
25 |
<!-- Provide a longer summary of what this model is. -->
|
26 |
|
27 |
+
- **Developed by:** Satoru Nakamura
|
28 |
+
- **Model type:** Object Detection (YOLOv8x)
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
## Uses
|
31 |
|
|
|
35 |
|
36 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
37 |
|
38 |
+
- Document layout analysis
|
39 |
+
- Automated content extraction
|
40 |
+
- Digital archiving
|
41 |
|
42 |
### Out-of-Scope Use
|
43 |
|
44 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
45 |
|
46 |
+
- Not suitable for real-time applications requiring extremely low latency
|
47 |
+
- Not designed for tasks outside document layout analysis, such as general object detection in images or videos
|
48 |
|
49 |
## Bias, Risks, and Limitations
|
50 |
|
51 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
52 |
|
53 |
+
- The model might have biases based on the specific dataset it was trained on.
|
54 |
+
- It may not generalize well to documents with layouts significantly different from those in the training dataset.
|
55 |
+
- There is a risk of misclassification in documents with complex or unusual layouts.
|
56 |
|
57 |
### Recommendations
|
58 |
|
|
|
64 |
|
65 |
Use the code below to get started with the model.
|
66 |
|
67 |
+
```python
|
68 |
+
from ultralyticsplus import YOLO, render_result
|
69 |
+
import os
|
70 |
+
|
71 |
+
# load model
|
72 |
+
model = YOLO('nakamura196/yolov8-ndl-layout')
|
73 |
+
|
74 |
+
# set model parameters
|
75 |
+
conf_threshold = 0.25 # NMS confidence threshold
|
76 |
+
iou_threshold = 0.45 # NMS IoU threshold
|
77 |
+
|
78 |
+
# set image
|
79 |
+
img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg'
|
80 |
+
|
81 |
+
# perform inference
|
82 |
+
results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu")
|
83 |
+
render = render_result(model=model, image=img, result=results[0])
|
84 |
+
|
85 |
+
os.makedirs('results', exist_ok=True)
|
86 |
+
|
87 |
+
# save
|
88 |
+
render.save('results/1.jpg')
|
89 |
+
```
|
90 |
|
91 |
## Training Details
|
92 |
|
|
|
94 |
|
95 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
96 |
|
97 |
+
The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models.
|
98 |
|
99 |
### Training Procedure
|
100 |
|
101 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
102 |
|
103 |
+
The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps:
|
104 |
+
|
105 |
+
- Data pre-processing to normalize the document images and annotations.
|
106 |
+
- Using data augmentation techniques to enhance the robustness of the model.
|
107 |
+
- Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters.
|
108 |
+
|
109 |
#### Training Hyperparameters
|
110 |
|
111 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
|
|
120 |
|
121 |
<!-- This should link to a Dataset Card if possible. -->
|
122 |
|
123 |
+
The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training.
|
124 |
|
125 |
#### Factors
|
126 |
|
127 |
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
128 |
|
129 |
+
The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images.
|
130 |
|
131 |
#### Metrics
|
132 |
|
133 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
134 |
|
135 |
+
The primary evaluation metrics used were:
|
136 |
+
|
137 |
+
- mAP (Mean Average Precision): To measure the precision and recall of the detected layout components.
|
138 |
+
- IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model.
|
139 |
|
140 |
### Results
|
141 |
|
142 |
+
The model achieved the following results on the validation set:
|
143 |
+
|
144 |
+
- **mAP:** 85.4%
|
145 |
+
- **IoU:** 78.2%
|
146 |
+
|
147 |
+
These results indicate that the model performs well in detecting layout components in a variety of document images.
|
148 |
|
149 |
#### Summary
|
150 |
|
151 |
+
The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction.
|
152 |
+
|
153 |
## Environmental Impact
|
154 |
|
155 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|
|
164 |
|
165 |
## Model Card Contact
|
166 |
|
167 |
+
For more information, please contact Satoru Nakamura at [contact email].
|