AlekseyKorshuk commited on
Commit
dc7c5f5
·
1 Parent(s): a4d0487
detection_metrics.py CHANGED
@@ -5,9 +5,1701 @@ import torch
5
  import evaluate
6
  import json
7
  from tqdm import tqdm
8
- from detection_metrics.pycocotools.coco import COCO
9
- from detection_metrics.coco_evaluate import COCOEvaluator
10
- from detection_metrics.utils import _TYPING_PREDICTION, _TYPING_REFERENCE
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  _DESCRIPTION = "This class evaluates object detection models using the COCO dataset \
13
  and its evaluation metrics."
@@ -37,6 +1729,7 @@ Args:
37
  **kwargs: Additional keyword arguments forwarded to evaluate.Metrics.
38
  """
39
 
 
40
  class EvaluateObjectDetection(evaluate.Metric):
41
  """
42
  Class for evaluating object detection models.
@@ -74,7 +1767,7 @@ class EvaluateObjectDetection(evaluate.Metric):
74
  self.coco_evaluator.coco_gt.cats = cats
75
  self.coco_evaluator.coco_gt.dataset["categories"] = list(cats.values())
76
  self.coco_evaluator.coco_eval["bbox"].params.catIds = [c["id"] for c in cats.values()]
77
-
78
  def _info(self):
79
  """
80
  Returns the MetricInfo object with information about the module.
@@ -117,7 +1810,7 @@ class EvaluateObjectDetection(evaluate.Metric):
117
  )
118
 
119
  def _preprocess(
120
- self, predictions: List[Dict[str, torch.Tensor]]
121
  ) -> List[_TYPING_PREDICTION]:
122
  """
123
  Preprocesses the predictions before computing the scores.
@@ -147,15 +1840,15 @@ class EvaluateObjectDetection(evaluate.Metric):
147
  for prediction in predictions:
148
  ret.append({k: v for k, v in prediction.items() if k in required})
149
  return ret
150
-
151
  def _clear_references(self, references):
152
  required = [""]
153
  ret = []
154
  for ref in references:
155
  ret.append({k: v for k, v in ref.items() if k in required})
156
  return ret
157
-
158
- def add(self, *, prediction = None, reference = None, **kwargs):
159
  """
160
  Preprocesses the predictions and references and calls the parent class function.
161
 
@@ -167,7 +1860,7 @@ class EvaluateObjectDetection(evaluate.Metric):
167
  if prediction is not None:
168
  prediction = self._clear_predictions(prediction)
169
  prediction = self._preprocess(prediction)
170
-
171
  res = {} # {image_id} : prediction
172
  for output, target in zip(prediction, reference):
173
  res[target["image_id"][0]] = output
@@ -176,9 +1869,9 @@ class EvaluateObjectDetection(evaluate.Metric):
176
  super(evaluate.Metric, self).add(prediction=prediction, references=reference, **kwargs)
177
 
178
  def _compute(
179
- self,
180
- predictions: List[List[_TYPING_PREDICTION]],
181
- references: List[List[_TYPING_REFERENCE]],
182
  ) -> Dict[str, Dict[str, float]]:
183
  """
184
  Returns the evaluation scores.
@@ -192,12 +1885,12 @@ class EvaluateObjectDetection(evaluate.Metric):
192
  """
193
  print("Synchronizing processes")
194
  self.coco_evaluator.synchronize_between_processes()
195
-
196
  print("Accumulating values")
197
  self.coco_evaluator.accumulate()
198
-
199
  print("Summarizing results")
200
  self.coco_evaluator.summarize()
201
-
202
  stats = self.coco_evaluator.get_results()
203
  return stats
 
5
  import evaluate
6
  import json
7
  from tqdm import tqdm
8
+
9
+ import copy
10
+ import pickle
11
+ from typing import Dict, List, Tuple, Union
12
+ from tqdm import tqdm
13
+ import numpy as np
14
+ import torch
15
+ import torch.distributed as dist
16
+ from datasets import Dataset
17
+
18
+ __author__ = 'tsungyi'
19
+
20
+ # Interface for manipulating masks stored in RLE format.
21
+ #
22
+ # RLE is a simple yet efficient format for storing binary masks. RLE
23
+ # first divides a vector (or vectorized image) into a series of piecewise
24
+ # constant regions and then for each piece simply stores the length of
25
+ # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
26
+ # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
27
+ # (note that the odd counts are always the numbers of zeros). Instead of
28
+ # storing the counts directly, additional compression is achieved with a
29
+ # variable bitrate representation based on a common scheme called LEB128.
30
+ #
31
+ # Compression is greatest given large piecewise constant regions.
32
+ # Specifically, the size of the RLE is proportional to the number of
33
+ # *boundaries* in M (or for an image the number of boundaries in the y
34
+ # direction). Assuming fairly simple shapes, the RLE representation is
35
+ # O(sqrt(n)) where n is number of pixels in the object. Hence space usage
36
+ # is substantially lower, especially for large simple objects (large n).
37
+ #
38
+ # Many common operations on masks can be computed directly using the RLE
39
+ # (without need for decoding). This includes computations such as area,
40
+ # union, intersection, etc. All of these operations are linear in the
41
+ # size of the RLE, in other words they are O(sqrt(n)) where n is the area
42
+ # of the object. Computing these operations on the original mask is O(n).
43
+ # Thus, using the RLE can result in substantial computational savings.
44
+ #
45
+ # The following API functions are defined:
46
+ # encode - Encode binary masks using RLE.
47
+ # decode - Decode binary masks encoded via RLE.
48
+ # merge - Compute union or intersection of encoded masks.
49
+ # iou - Compute intersection over union between masks.
50
+ # area - Compute area of encoded masks.
51
+ # toBbox - Get bounding boxes surrounding encoded masks.
52
+ # frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
53
+ #
54
+ # Usage:
55
+ # Rs = encode( masks )
56
+ # masks = decode( Rs )
57
+ # R = merge( Rs, intersect=false )
58
+ # o = iou( dt, gt, iscrowd )
59
+ # a = area( Rs )
60
+ # bbs = toBbox( Rs )
61
+ # Rs = frPyObjects( [pyObjects], h, w )
62
+ #
63
+ # In the API the following formats are used:
64
+ # Rs - [dict] Run-length encoding of binary masks
65
+ # R - dict Run-length encoding of binary mask
66
+ # masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
67
+ # iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
68
+ # bbs - [nx4] Bounding box(es) stored as [x y w h]
69
+ # poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
70
+ # dt,gt - May be either bounding boxes or encoded masks
71
+ # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
72
+ #
73
+ # Finally, a note about the intersection over union (iou) computation.
74
+ # The standard iou of a ground truth (gt) and detected (dt) object is
75
+ # iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
76
+ # For "crowd" regions, we use a modified criteria. If a gt object is
77
+ # marked as "iscrowd", we allow a dt to match any subregion of the gt.
78
+ # Choosing gt' in the crowd gt that best matches the dt can be done using
79
+ # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
80
+ # iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
81
+ # For crowd gt regions we use this modified criteria above for the iou.
82
+ #
83
+ # To compile run "python setup.py build_ext --inplace"
84
+ # Please do not contact us for help with compiling.
85
+ #
86
+ # Microsoft COCO Toolbox. version 2.0
87
+ # Data, paper, and tutorials available at: http://mscoco.org/
88
+ # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
89
+ # Licensed under the Simplified BSD License [see coco/license.txt]
90
+
91
+ iou = _mask.iou
92
+ merge = _mask.merge
93
+ frPyObjects = _mask.frPyObjects
94
+
95
+
96
+ def encode(bimask):
97
+ if len(bimask.shape) == 3:
98
+ return _mask.encode(bimask)
99
+ elif len(bimask.shape) == 2:
100
+ h, w = bimask.shape
101
+ return _mask.encode(bimask.reshape((h, w, 1), order='F'))[0]
102
+
103
+
104
+ def decode(rleObjs):
105
+ if type(rleObjs) == list:
106
+ return _mask.decode(rleObjs)
107
+ else:
108
+ return _mask.decode([rleObjs])[:, :, 0]
109
+
110
+
111
+ def area(rleObjs):
112
+ if type(rleObjs) == list:
113
+ return _mask.area(rleObjs)
114
+ else:
115
+ return _mask.area([rleObjs])[0]
116
+
117
+
118
+ def toBbox(rleObjs):
119
+ if type(rleObjs) == list:
120
+ return _mask.toBbox(rleObjs)
121
+ else:
122
+ return _mask.toBbox([rleObjs])[0]
123
+
124
+
125
+ # This code is a copy and paste with small modifications of the code:
126
+ # https://github.com/rafaelpadilla/review_object_detection_metrics/blob/main/src/evaluators/coco_evaluator.py
127
+
128
+ from typing import List
129
+ import numpy as np
130
+
131
+
132
+ class MaskEvaluator(object):
133
+ @staticmethod
134
+ def iou(
135
+ dt: List[List[float]], gt: List[List[float]], iscrowd: List[bool]
136
+ ) -> np.ndarray:
137
+ """
138
+ Calculate the intersection over union (IoU) between detection bounding boxes (dt) and \
139
+ ground truth bounding boxes (gt).
140
+ Reference: https://github.com/rafaelpadilla/review_object_detection_metrics
141
+
142
+ Args:
143
+ dt (List[List[float]]): List of detection bounding boxes in the \
144
+ format [x, y, width, height].
145
+ gt (List[List[float]]): List of ground-truth bounding boxes in the \
146
+ format [x, y, width, height].
147
+ iscrowd (List[bool]): List indicating if each ground-truth bounding box \
148
+ is a crowd region or not.
149
+
150
+ Returns:
151
+ np.ndarray: Array of IoU values of shape (len(dt), len(gt)).
152
+ """
153
+ assert len(iscrowd) == len(gt), "iou(iscrowd=) must have the same length as gt"
154
+ if len(dt) == 0 or len(gt) == 0:
155
+ return []
156
+ ious = np.zeros((len(dt), len(gt)), dtype=np.float64)
157
+ for g_idx, g in enumerate(gt):
158
+ for d_idx, d in enumerate(dt):
159
+ ious[d_idx, g_idx] = _jaccard(d, g, iscrowd[g_idx])
160
+ return ious
161
+
162
+
163
+ def _jaccard(a: List[float], b: List[float], iscrowd: bool) -> float:
164
+ """
165
+ Calculate the Jaccard index (intersection over union) between two bounding boxes.
166
+ For "crowd" regions, we use a modified criteria. If a gt object is
167
+ marked as "iscrowd", we allow a dt to match any subregion of the gt.
168
+ Choosing gt' in the crowd gt that best matches the dt can be done using
169
+ gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
170
+ iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
171
+ For crowd gt regions we use this modified criteria above for the iou.
172
+
173
+ Args:
174
+ a (List[float]): Bounding box coordinates in the format [x, y, width, height].
175
+ b (List[float]): Bounding box coordinates in the format [x, y, width, height].
176
+ iscrowd (bool): Flag indicating if the second bounding box is a crowd region or not.
177
+
178
+ Returns:
179
+ float: Jaccard index between the two bounding boxes.
180
+ """
181
+ eps = 4e-12
182
+ xa, ya, x2a, y2a = a[0], a[1], a[0] + a[2], a[1] + a[3]
183
+ xb, yb, x2b, y2b = b[0], b[1], b[0] + b[2], b[1] + b[3]
184
+
185
+ # innermost left x
186
+ xi = max(xa, xb)
187
+ # innermost right x
188
+ x2i = min(x2a, x2b)
189
+ # same for y
190
+ yi = max(ya, yb)
191
+ y2i = min(y2a, y2b)
192
+
193
+ # calculate areas
194
+ Aa = max(x2a - xa, 0.) * max(y2a - ya, 0.)
195
+ Ab = max(x2b - xb, 0.) * max(y2b - yb, 0.)
196
+ Ai = max(x2i - xi, 0.) * max(y2i - yi, 0.)
197
+
198
+ if iscrowd:
199
+ return Ai / (Aa + eps)
200
+
201
+ return Ai / (Aa + Ab - Ai + eps)
202
+
203
+
204
+ # This code is basically a copy and paste from the original cocoapi repo:
205
+ # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py
206
+ # with the following changes have been made:
207
+ # * Replace the usage of mask (maskUtils) by MaskEvaluator.
208
+ # * Comment out prints in the evaluate() function.
209
+ # * Include a return of the function evaluate. Inspired
210
+ # by @ybelkada (https://huggingface.co/spaces/ybelkada/cocoevaluate/)
211
+
212
+ __author__ = "tsungyi"
213
+
214
+ import copy
215
+ import datetime
216
+ import time
217
+ from collections import defaultdict
218
+ from packaging import version
219
+
220
+ import numpy as np
221
+
222
+ if version.parse(np.__version__) < version.parse("1.24"):
223
+ dtype_float = np.float
224
+ else:
225
+ dtype_float = np.float32
226
+
227
+
228
+ class COCOeval:
229
+ # Interface for evaluating detection on the Microsoft COCO dataset.
230
+ #
231
+ # The usage for CocoEval is as follows:
232
+ # cocoGt=..., cocoDt=... # load dataset and results
233
+ # E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
234
+ # E.params.recThrs = ...; # set parameters as desired
235
+ # E.evaluate(); # run per image evaluation
236
+ # E.accumulate(); # accumulate per image results
237
+ # E.summarize(); # display summary metrics of results
238
+ # For example usage see evalDemo.m and http://mscoco.org/.
239
+ #
240
+ # The evaluation parameters are as follows (defaults in brackets):
241
+ # imgIds - [all] N img ids to use for evaluation
242
+ # catIds - [all] K cat ids to use for evaluation
243
+ # iouThrs - [.5:.05:.95] T=10 IoU thresholds for evaluation
244
+ # recThrs - [0:.01:1] R=101 recall thresholds for evaluation
245
+ # areaRng - [...] A=4 object area ranges for evaluation
246
+ # maxDets - [1 10 100] M=3 thresholds on max detections per image
247
+ # iouType - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
248
+ # iouType replaced the now DEPRECATED useSegm parameter.
249
+ # useCats - [1] if true use category labels for evaluation
250
+ # Note: if useCats=0 category labels are ignored as in proposal scoring.
251
+ # Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
252
+ #
253
+ # evaluate(): evaluates detections on every image and every category and
254
+ # concats the results into the "evalImgs" with fields:
255
+ # dtIds - [1xD] id for each of the D detections (dt)
256
+ # gtIds - [1xG] id for each of the G ground truths (gt)
257
+ # dtMatches - [TxD] matching gt id at each IoU or 0
258
+ # gtMatches - [TxG] matching dt id at each IoU or 0
259
+ # dtScores - [1xD] confidence of each dt
260
+ # gtIgnore - [1xG] ignore flag for each gt
261
+ # dtIgnore - [TxD] ignore flag for each dt at each IoU
262
+ #
263
+ # accumulate(): accumulates the per-image, per-category evaluation
264
+ # results in "evalImgs" into the dictionary "eval" with fields:
265
+ # params - parameters used for evaluation
266
+ # date - date evaluation was performed
267
+ # counts - [T,R,K,A,M] parameter dimensions (see above)
268
+ # precision - [TxRxKxAxM] precision for every evaluation setting
269
+ # recall - [TxKxAxM] max recall for every evaluation setting
270
+ # Note: precision and recall==-1 for settings with no gt objects.
271
+ #
272
+ # See also coco, mask, pycocoDemo, pycocoEvalDemo
273
+ #
274
+ # Microsoft COCO Toolbox. version 2.0
275
+ # Data, paper, and tutorials available at: http://mscoco.org/
276
+ # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
277
+ # Licensed under the Simplified BSD License [see coco/license.txt]
278
+ def __init__(self, cocoGt=None, cocoDt=None, iouType="segm"):
279
+ """
280
+ Initialize CocoEval using coco APIs for gt and dt
281
+ :param cocoGt: coco object with ground truth annotations
282
+ :param cocoDt: coco object with detection results
283
+ :return: None
284
+ """
285
+ if not iouType:
286
+ print("iouType not specified. use default iouType segm")
287
+ self.cocoGt = cocoGt # ground truth COCO API
288
+ self.cocoDt = cocoDt # detections COCO API
289
+ self.evalImgs = defaultdict(
290
+ list
291
+ ) # per-image per-category evaluation results [KxAxI] elements
292
+ self.eval = {} # accumulated evaluation results
293
+ self._gts = defaultdict(list) # gt for evaluation
294
+ self._dts = defaultdict(list) # dt for evaluation
295
+ self.params = Params(iouType=iouType) # parameters
296
+ self._paramsEval = {} # parameters for evaluation
297
+ self.stats = [] # result summarization
298
+ self.ious = {} # ious between all gts and dts
299
+ if not cocoGt is None:
300
+ self.params.imgIds = sorted(cocoGt.getImgIds())
301
+ self.params.catIds = sorted(cocoGt.getCatIds())
302
+
303
+ def _prepare(self):
304
+ """
305
+ Prepare ._gts and ._dts for evaluation based on params
306
+ :return: None
307
+ """
308
+
309
+ def _toMask(anns, coco):
310
+ # modify ann['segmentation'] by reference
311
+ for ann in anns:
312
+ rle = coco.annToRLE(ann)
313
+ ann["segmentation"] = rle
314
+
315
+ p = self.params
316
+ if p.useCats:
317
+ gts = self.cocoGt.loadAnns(
318
+ self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds)
319
+ )
320
+ dts = self.cocoDt.loadAnns(
321
+ self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds)
322
+ )
323
+ else:
324
+ gts = self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
325
+ dts = self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
326
+
327
+ # convert ground truth to mask if iouType == 'segm'
328
+ if p.iouType == "segm":
329
+ _toMask(gts, self.cocoGt)
330
+ _toMask(dts, self.cocoDt)
331
+ # set ignore flag
332
+ for gt in gts:
333
+ gt["ignore"] = gt["ignore"] if "ignore" in gt else 0
334
+ gt["ignore"] = "iscrowd" in gt and gt["iscrowd"]
335
+ if p.iouType == "keypoints":
336
+ gt["ignore"] = (gt["num_keypoints"] == 0) or gt["ignore"]
337
+ self._gts = defaultdict(list) # gt for evaluation
338
+ self._dts = defaultdict(list) # dt for evaluation
339
+ for gt in gts:
340
+ self._gts[gt["image_id"], gt["category_id"]].append(gt)
341
+ for dt in dts:
342
+ self._dts[dt["image_id"], dt["category_id"]].append(dt)
343
+ self.evalImgs = defaultdict(list) # per-image per-category evaluation results
344
+ self.eval = {} # accumulated evaluation results
345
+
346
+ def evaluate(self):
347
+ """
348
+ Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
349
+ :return: None
350
+ """
351
+ # tic = time.time()
352
+ # print("Running per image evaluation...")
353
+ p = self.params
354
+ # add backward compatibility if useSegm is specified in params
355
+ if not p.useSegm is None:
356
+ p.iouType = "segm" if p.useSegm == 1 else "bbox"
357
+ # print(
358
+ # "useSegm (deprecated) is not None. Running {} evaluation".format(
359
+ # p.iouType
360
+ # )
361
+ # )
362
+ # print("Evaluate annotation type *{}*".format(p.iouType))
363
+ p.imgIds = list(np.unique(p.imgIds))
364
+ if p.useCats:
365
+ p.catIds = list(np.unique(p.catIds))
366
+ p.maxDets = sorted(p.maxDets)
367
+ self.params = p
368
+
369
+ self._prepare()
370
+ # loop through images, area range, max detection number
371
+ catIds = p.catIds if p.useCats else [-1]
372
+
373
+ if p.iouType == "segm" or p.iouType == "bbox":
374
+ computeIoU = self.computeIoU
375
+ elif p.iouType == "keypoints":
376
+ computeIoU = self.computeOks
377
+ self.ious = {
378
+ (imgId, catId): computeIoU(imgId, catId)
379
+ for imgId in p.imgIds
380
+ for catId in catIds
381
+ }
382
+
383
+ evaluateImg = self.evaluateImg
384
+ maxDet = p.maxDets[-1]
385
+ self.evalImgs = [
386
+ evaluateImg(imgId, catId, areaRng, maxDet)
387
+ for catId in catIds
388
+ for areaRng in p.areaRng
389
+ for imgId in p.imgIds
390
+ ]
391
+ self._paramsEval = copy.deepcopy(self.params)
392
+ ret_evalImgs = np.asarray(self.evalImgs).reshape(
393
+ len(catIds), len(p.areaRng), len(p.imgIds)
394
+ )
395
+ # toc = time.time()
396
+ # print("DONE (t={:0.2f}s).".format(toc - tic))
397
+ return ret_evalImgs
398
+
399
+ def computeIoU(self, imgId, catId):
400
+ p = self.params
401
+ if p.useCats:
402
+ gt = self._gts[imgId, catId]
403
+ dt = self._dts[imgId, catId]
404
+ else:
405
+ gt = [_ for cId in p.catIds for _ in self._gts[imgId, cId]]
406
+ dt = [_ for cId in p.catIds for _ in self._dts[imgId, cId]]
407
+ if len(gt) == 0 and len(dt) == 0:
408
+ return []
409
+ inds = np.argsort([-d["score"] for d in dt], kind="mergesort")
410
+ dt = [dt[i] for i in inds]
411
+ if len(dt) > p.maxDets[-1]:
412
+ dt = dt[0: p.maxDets[-1]]
413
+
414
+ if p.iouType == "segm":
415
+ g = [g["segmentation"] for g in gt]
416
+ d = [d["segmentation"] for d in dt]
417
+ elif p.iouType == "bbox":
418
+ g = [g["bbox"] for g in gt]
419
+ d = [d["bbox"] for d in dt]
420
+ else:
421
+ raise Exception("unknown iouType for iou computation")
422
+
423
+ # compute iou between each dt and gt region
424
+ iscrowd = [int(o["iscrowd"]) for o in gt]
425
+ ious = maskUtils.iou(d, g, iscrowd)
426
+ return ious
427
+
428
+ def computeOks(self, imgId, catId):
429
+ p = self.params
430
+ # dimention here should be Nxm
431
+ gts = self._gts[imgId, catId]
432
+ dts = self._dts[imgId, catId]
433
+ inds = np.argsort([-d["score"] for d in dts], kind="mergesort")
434
+ dts = [dts[i] for i in inds]
435
+ if len(dts) > p.maxDets[-1]:
436
+ dts = dts[0: p.maxDets[-1]]
437
+ # if len(gts) == 0 and len(dts) == 0:
438
+ if len(gts) == 0 or len(dts) == 0:
439
+ return []
440
+ ious = np.zeros((len(dts), len(gts)))
441
+ sigmas = p.kpt_oks_sigmas
442
+ vars = (sigmas * 2) ** 2
443
+ k = len(sigmas)
444
+ # compute oks between each detection and ground truth object
445
+ for j, gt in enumerate(gts):
446
+ # create bounds for ignore regions(double the gt bbox)
447
+ g = np.array(gt["keypoints"])
448
+ xg = g[0::3]
449
+ yg = g[1::3]
450
+ vg = g[2::3]
451
+ k1 = np.count_nonzero(vg > 0)
452
+ bb = gt["bbox"]
453
+ x0 = bb[0] - bb[2]
454
+ x1 = bb[0] + bb[2] * 2
455
+ y0 = bb[1] - bb[3]
456
+ y1 = bb[1] + bb[3] * 2
457
+ for i, dt in enumerate(dts):
458
+ d = np.array(dt["keypoints"])
459
+ xd = d[0::3]
460
+ yd = d[1::3]
461
+ if k1 > 0:
462
+ # measure the per-keypoint distance if keypoints visible
463
+ dx = xd - xg
464
+ dy = yd - yg
465
+ else:
466
+ # measure minimum distance to keypoints in (x0,y0) & (x1,y1)
467
+ z = np.zeros((k))
468
+ dx = np.max((z, x0 - xd), axis=0) + np.max((z, xd - x1), axis=0)
469
+ dy = np.max((z, y0 - yd), axis=0) + np.max((z, yd - y1), axis=0)
470
+ e = (dx ** 2 + dy ** 2) / vars / (gt["area"] + np.spacing(1)) / 2
471
+ if k1 > 0:
472
+ e = e[vg > 0]
473
+ ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
474
+ return ious
475
+
476
+ def evaluateImg(self, imgId, catId, aRng, maxDet):
477
+ """
478
+ perform evaluation for single category and image
479
+ :return: dict (single image results)
480
+ """
481
+ p = self.params
482
+ if p.useCats:
483
+ gt = self._gts[imgId, catId]
484
+ dt = self._dts[imgId, catId]
485
+ else:
486
+ gt = [_ for cId in p.catIds for _ in self._gts[imgId, cId]]
487
+ dt = [_ for cId in p.catIds for _ in self._dts[imgId, cId]]
488
+ if len(gt) == 0 and len(dt) == 0:
489
+ return None
490
+
491
+ for g in gt:
492
+ if g["ignore"] or (g["area"] < aRng[0] or g["area"] > aRng[1]):
493
+ g["_ignore"] = 1
494
+ else:
495
+ g["_ignore"] = 0
496
+
497
+ # sort dt highest score first, sort gt ignore last
498
+ gtind = np.argsort([g["_ignore"] for g in gt], kind="mergesort")
499
+ gt = [gt[i] for i in gtind]
500
+ dtind = np.argsort([-d["score"] for d in dt], kind="mergesort")
501
+ dt = [dt[i] for i in dtind[0:maxDet]]
502
+ iscrowd = [int(o["iscrowd"]) for o in gt]
503
+ # load computed ious
504
+ ious = (
505
+ self.ious[imgId, catId][:, gtind]
506
+ if len(self.ious[imgId, catId]) > 0
507
+ else self.ious[imgId, catId]
508
+ )
509
+
510
+ T = len(p.iouThrs)
511
+ G = len(gt)
512
+ D = len(dt)
513
+ gtm = np.zeros((T, G))
514
+ dtm = np.zeros((T, D))
515
+ gtIg = np.array([g["_ignore"] for g in gt])
516
+ dtIg = np.zeros((T, D))
517
+ if not len(ious) == 0:
518
+ for tind, t in enumerate(p.iouThrs):
519
+ for dind, d in enumerate(dt):
520
+ # information about best match so far (m=-1 -> unmatched)
521
+ iou = min([t, 1 - 1e-10])
522
+ m = -1
523
+ for gind, g in enumerate(gt):
524
+ # if this gt already matched, and not a crowd, continue
525
+ if gtm[tind, gind] > 0 and not iscrowd[gind]:
526
+ continue
527
+ # if dt matched to reg gt, and on ignore gt, stop
528
+ if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
529
+ break
530
+ # continue to next gt unless better match made
531
+ if ious[dind, gind] < iou:
532
+ continue
533
+ # if match successful and best so far, store appropriately
534
+ iou = ious[dind, gind]
535
+ m = gind
536
+ # if match made store id of match for both dt and gt
537
+ if m == -1:
538
+ continue
539
+ dtIg[tind, dind] = gtIg[m]
540
+ dtm[tind, dind] = gt[m]["id"]
541
+ gtm[tind, m] = d["id"]
542
+ # set unmatched detections outside of area range to ignore
543
+ a = np.array([d["area"] < aRng[0] or d["area"] > aRng[1] for d in dt]).reshape(
544
+ (1, len(dt))
545
+ )
546
+ dtIg = np.logical_or(dtIg, np.logical_and(dtm == 0, np.repeat(a, T, 0)))
547
+ # store results for given image and category
548
+ return {
549
+ "image_id": imgId,
550
+ "category_id": catId,
551
+ "aRng": aRng,
552
+ "maxDet": maxDet,
553
+ "dtIds": [d["id"] for d in dt],
554
+ "gtIds": [g["id"] for g in gt],
555
+ "dtMatches": dtm,
556
+ "gtMatches": gtm,
557
+ "dtScores": [d["score"] for d in dt],
558
+ "gtIgnore": gtIg,
559
+ "dtIgnore": dtIg,
560
+ }
561
+
562
+ def accumulate(self, p=None):
563
+ """
564
+ Accumulate per image evaluation results and store the result in self.eval
565
+ :param p: input params for evaluation
566
+ :return: None
567
+ """
568
+ print("Accumulating evaluation results...")
569
+ tic = time.time()
570
+ if not self.evalImgs:
571
+ print("Please run evaluate() first")
572
+ # allows input customized parameters
573
+ if p is None:
574
+ p = self.params
575
+ p.catIds = p.catIds if p.useCats == 1 else [-1]
576
+ T = len(p.iouThrs)
577
+ R = len(p.recThrs)
578
+ K = len(p.catIds) if p.useCats else 1
579
+ A = len(p.areaRng)
580
+ M = len(p.maxDets)
581
+ precision = -np.ones(
582
+ (T, R, K, A, M)
583
+ ) # -1 for the precision of absent categories
584
+ recall = -np.ones((T, K, A, M))
585
+ scores = -np.ones((T, R, K, A, M))
586
+
587
+ # create dictionary for future indexing
588
+ _pe = self._paramsEval
589
+ catIds = _pe.catIds if _pe.useCats else [-1]
590
+ setK = set(catIds)
591
+ setA = set(map(tuple, _pe.areaRng))
592
+ setM = set(_pe.maxDets)
593
+ setI = set(_pe.imgIds)
594
+ # get inds to evaluate
595
+ k_list = [n for n, k in enumerate(p.catIds) if k in setK]
596
+ m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
597
+ a_list = [
598
+ n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA
599
+ ]
600
+ i_list = [n for n, i in enumerate(p.imgIds) if i in setI]
601
+ I0 = len(_pe.imgIds)
602
+ A0 = len(_pe.areaRng)
603
+ # retrieve E at each category, area range, and max number of detections
604
+ for k, k0 in enumerate(k_list):
605
+ Nk = k0 * A0 * I0
606
+ for a, a0 in enumerate(a_list):
607
+ Na = a0 * I0
608
+ for m, maxDet in enumerate(m_list):
609
+ E = [self.evalImgs[Nk + Na + i] for i in i_list]
610
+ E = [e for e in E if not e is None]
611
+ if len(E) == 0:
612
+ continue
613
+ dtScores = np.concatenate([e["dtScores"][0:maxDet] for e in E])
614
+
615
+ # different sorting method generates slightly different results.
616
+ # mergesort is used to be consistent as Matlab implementation.
617
+ inds = np.argsort(-dtScores, kind="mergesort")
618
+ dtScoresSorted = dtScores[inds]
619
+
620
+ dtm = np.concatenate(
621
+ [e["dtMatches"][:, 0:maxDet] for e in E], axis=1
622
+ )[:, inds]
623
+ dtIg = np.concatenate(
624
+ [e["dtIgnore"][:, 0:maxDet] for e in E], axis=1
625
+ )[:, inds]
626
+ gtIg = np.concatenate([e["gtIgnore"] for e in E])
627
+ npig = np.count_nonzero(gtIg == 0)
628
+ if npig == 0:
629
+ continue
630
+ tps = np.logical_and(dtm, np.logical_not(dtIg))
631
+ fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
632
+
633
+ tp_sum = np.cumsum(tps, axis=1).astype(dtype=dtype_float)
634
+ fp_sum = np.cumsum(fps, axis=1).astype(dtype=dtype_float)
635
+ for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
636
+ tp = np.array(tp)
637
+ fp = np.array(fp)
638
+ nd = len(tp)
639
+ rc = tp / npig
640
+ pr = tp / (fp + tp + np.spacing(1))
641
+ q = np.zeros((R,))
642
+ ss = np.zeros((R,))
643
+
644
+ if nd:
645
+ recall[t, k, a, m] = rc[-1]
646
+ else:
647
+ recall[t, k, a, m] = 0
648
+
649
+ # numpy is slow without cython optimization for accessing elements
650
+ # use python array gets significant speed improvement
651
+ pr = pr.tolist()
652
+ q = q.tolist()
653
+
654
+ for i in range(nd - 1, 0, -1):
655
+ if pr[i] > pr[i - 1]:
656
+ pr[i - 1] = pr[i]
657
+
658
+ inds = np.searchsorted(rc, p.recThrs, side="left")
659
+ try:
660
+ for ri, pi in enumerate(inds):
661
+ q[ri] = pr[pi]
662
+ ss[ri] = dtScoresSorted[pi]
663
+ except:
664
+ pass
665
+ precision[t, :, k, a, m] = np.array(q)
666
+ scores[t, :, k, a, m] = np.array(ss)
667
+ self.eval = {
668
+ "params": p,
669
+ "counts": [T, R, K, A, M],
670
+ "date": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
671
+ "precision": precision,
672
+ "recall": recall,
673
+ "scores": scores,
674
+ }
675
+ toc = time.time()
676
+ print("DONE (t={:0.2f}s).".format(toc - tic))
677
+
678
+ def summarize(self):
679
+ """
680
+ Compute and display summary metrics for evaluation results.
681
+ Note this functin can *only* be applied on the default parameter setting
682
+ """
683
+
684
+ def _summarize(ap=1, iouThr=None, areaRng="all", maxDets=100):
685
+ p = self.params
686
+ iStr = " {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}"
687
+ titleStr = "Average Precision" if ap == 1 else "Average Recall"
688
+ typeStr = "(AP)" if ap == 1 else "(AR)"
689
+ iouStr = (
690
+ "{:0.2f}:{:0.2f}".format(p.iouThrs[0], p.iouThrs[-1])
691
+ if iouThr is None
692
+ else "{:0.2f}".format(iouThr)
693
+ )
694
+
695
+ aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
696
+ mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
697
+ if ap == 1:
698
+ # dimension of precision: [TxRxKxAxM]
699
+ s = self.eval["precision"]
700
+ # IoU
701
+ if iouThr is not None:
702
+ t = np.where(iouThr == p.iouThrs)[0]
703
+ s = s[t]
704
+ s = s[:, :, :, aind, mind]
705
+ else:
706
+ # dimension of recall: [TxKxAxM]
707
+ s = self.eval["recall"]
708
+ if iouThr is not None:
709
+ t = np.where(iouThr == p.iouThrs)[0]
710
+ s = s[t]
711
+ s = s[:, :, aind, mind]
712
+ if len(s[s > -1]) == 0:
713
+ mean_s = -1
714
+ else:
715
+ mean_s = np.mean(s[s > -1])
716
+ print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
717
+ return mean_s
718
+
719
+ def _summarizeDets():
720
+ stats = np.zeros((12,))
721
+ stats[0] = _summarize(1)
722
+ stats[1] = _summarize(1, iouThr=0.5, maxDets=self.params.maxDets[2])
723
+ stats[2] = _summarize(1, iouThr=0.75, maxDets=self.params.maxDets[2])
724
+ stats[3] = _summarize(1, areaRng="small", maxDets=self.params.maxDets[2])
725
+ stats[4] = _summarize(1, areaRng="medium", maxDets=self.params.maxDets[2])
726
+ stats[5] = _summarize(1, areaRng="large", maxDets=self.params.maxDets[2])
727
+ stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
728
+ stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
729
+ stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
730
+ stats[9] = _summarize(0, areaRng="small", maxDets=self.params.maxDets[2])
731
+ stats[10] = _summarize(0, areaRng="medium", maxDets=self.params.maxDets[2])
732
+ stats[11] = _summarize(0, areaRng="large", maxDets=self.params.maxDets[2])
733
+ return stats
734
+
735
+ def _summarizeKps():
736
+ stats = np.zeros((10,))
737
+ stats[0] = _summarize(1, maxDets=20)
738
+ stats[1] = _summarize(1, maxDets=20, iouThr=0.5)
739
+ stats[2] = _summarize(1, maxDets=20, iouThr=0.75)
740
+ stats[3] = _summarize(1, maxDets=20, areaRng="medium")
741
+ stats[4] = _summarize(1, maxDets=20, areaRng="large")
742
+ stats[5] = _summarize(0, maxDets=20)
743
+ stats[6] = _summarize(0, maxDets=20, iouThr=0.5)
744
+ stats[7] = _summarize(0, maxDets=20, iouThr=0.75)
745
+ stats[8] = _summarize(0, maxDets=20, areaRng="medium")
746
+ stats[9] = _summarize(0, maxDets=20, areaRng="large")
747
+ return stats
748
+
749
+ if not self.eval:
750
+ raise Exception("Please run accumulate() first")
751
+ iouType = self.params.iouType
752
+ if iouType == "segm" or iouType == "bbox":
753
+ summarize = _summarizeDets
754
+ elif iouType == "keypoints":
755
+ summarize = _summarizeKps
756
+ self.stats = summarize()
757
+
758
+ def __str__(self):
759
+ self.summarize()
760
+
761
+
762
+ class Params:
763
+ """
764
+ Params for coco evaluation api
765
+ """
766
+
767
+ def setDetParams(self):
768
+ self.imgIds = []
769
+ self.catIds = []
770
+ # np.arange causes trouble. the data point on arange is slightly larger than the true value
771
+ self.iouThrs = np.linspace(
772
+ 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
773
+ )
774
+ self.recThrs = np.linspace(
775
+ 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
776
+ )
777
+ self.maxDets = [1, 10, 100]
778
+ self.areaRng = [
779
+ [0 ** 2, 1e5 ** 2],
780
+ [0 ** 2, 32 ** 2],
781
+ [32 ** 2, 96 ** 2],
782
+ [96 ** 2, 1e5 ** 2],
783
+ ]
784
+ self.areaRngLbl = ["all", "small", "medium", "large"]
785
+ self.useCats = 1
786
+
787
+ def setKpParams(self):
788
+ self.imgIds = []
789
+ self.catIds = []
790
+ # np.arange causes trouble. the data point on arange is slightly larger than the true value
791
+ self.iouThrs = np.linspace(
792
+ 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
793
+ )
794
+ self.recThrs = np.linspace(
795
+ 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
796
+ )
797
+ self.maxDets = [20]
798
+ self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
799
+ self.areaRngLbl = ["all", "medium", "large"]
800
+ self.useCats = 1
801
+ self.kpt_oks_sigmas = (
802
+ np.array(
803
+ [
804
+ 0.26,
805
+ 0.25,
806
+ 0.25,
807
+ 0.35,
808
+ 0.35,
809
+ 0.79,
810
+ 0.79,
811
+ 0.72,
812
+ 0.72,
813
+ 0.62,
814
+ 0.62,
815
+ 1.07,
816
+ 1.07,
817
+ 0.87,
818
+ 0.87,
819
+ 0.89,
820
+ 0.89,
821
+ ]
822
+ )
823
+ / 10.0
824
+ )
825
+
826
+ def __init__(self, iouType="segm"):
827
+ if iouType == "bbox":
828
+ self.setDetParams()
829
+ else:
830
+ raise Exception("iouType not supported")
831
+ self.iouType = iouType
832
+ # useSegm is deprecated
833
+ self.useSegm = None
834
+
835
+
836
+ # This code is basically a copy and paste from the original cocoapi file:
837
+ # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/coco.py
838
+ # with the following changes:
839
+ # * Instead of receiving the path to the annotation file, it receives a json object.
840
+ # * Commented out all parts of code that depends on maskUtils, which is not needed
841
+ # for bounding box evaluation.
842
+
843
+ __author__ = "tylin"
844
+ __version__ = "2.0"
845
+
846
+ # Interface for accessing the Microsoft COCO dataset.
847
+
848
+ # Microsoft COCO is a large image dataset designed for object detection,
849
+ # segmentation, and caption generation. pycocotools is a Python API that
850
+ # assists in loading, parsing and visualizing the annotations in COCO.
851
+ # Please visit http://mscoco.org/ for more information on COCO, including
852
+ # for the data, paper, and tutorials. The exact format of the annotations
853
+ # is also described on the COCO website. For example usage of the pycocotools
854
+ # please see pycocotools_demo.ipynb. In addition to this API, please download both
855
+ # the COCO images and annotations in order to run the demo.
856
+
857
+ # An alternative to using the API is to load the annotations directly
858
+ # into Python dictionary
859
+ # Using the API provides additional utility functions. Note that this API
860
+ # supports both *instance* and *caption* annotations. In the case of
861
+ # captions not all functions are defined (e.g. categories are undefined).
862
+
863
+ # The following API functions are defined:
864
+ # COCO - COCO api class that loads COCO annotation file and prepare data structures.
865
+ # decodeMask - Decode binary mask M encoded via run-length encoding.
866
+ # encodeMask - Encode binary mask M using run-length encoding.
867
+ # getAnnIds - Get ann ids that satisfy given filter conditions.
868
+ # getCatIds - Get cat ids that satisfy given filter conditions.
869
+ # getImgIds - Get img ids that satisfy given filter conditions.
870
+ # loadAnns - Load anns with the specified ids.
871
+ # loadCats - Load cats with the specified ids.
872
+ # loadImgs - Load imgs with the specified ids.
873
+ # annToMask - Convert segmentation in an annotation to binary mask.
874
+ # showAnns - Display the specified annotations.
875
+ # loadRes - Load algorithm results and create API for accessing them.
876
+ # download - Download COCO images from mscoco.org server.
877
+ # Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
878
+ # Help on each functions can be accessed by: "help COCO>function".
879
+
880
+ # See also COCO>decodeMask,
881
+ # COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
882
+ # COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
883
+ # COCO>loadImgs, COCO>annToMask, COCO>showAnns
884
+
885
+ # Microsoft COCO Toolbox. version 2.0
886
+ # Data, paper, and tutorials available at: http://mscoco.org/
887
+ # Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
888
+ # Licensed under the Simplified BSD License [see bsd.txt]
889
+
890
+ import copy
891
+ import itertools
892
+ import json
893
+ # from . import mask as maskUtils
894
+ import os
895
+ import sys
896
+ import time
897
+ from collections import defaultdict
898
+
899
+ import matplotlib.pyplot as plt
900
+ import numpy as np
901
+ from matplotlib.collections import PatchCollection
902
+ from matplotlib.patches import Polygon
903
+
904
+ PYTHON_VERSION = sys.version_info[0]
905
+ if PYTHON_VERSION == 2:
906
+ from urllib import urlretrieve
907
+ elif PYTHON_VERSION == 3:
908
+ from urllib.request import urlretrieve
909
+
910
+
911
+ def _isArrayLike(obj):
912
+ return hasattr(obj, "__iter__") and hasattr(obj, "__len__")
913
+
914
+
915
+ class COCO:
916
+ def __init__(self, annotations=None):
917
+ """
918
+ Constructor of Microsoft COCO helper class for reading and visualizing annotations.
919
+ :param annotation_file (str): location of annotation file
920
+ :param image_folder (str): location to the folder that hosts images.
921
+ :return:
922
+ """
923
+ # load dataset
924
+ self.dataset, self.anns, self.cats, self.imgs = dict(), dict(), dict(), dict()
925
+ self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
926
+ # Modified the original code to receive a json object instead of a path to a file
927
+ if annotations:
928
+ assert (
929
+ type(annotations) == dict
930
+ ), f"annotation file format {type(annotations)} not supported."
931
+ self.dataset = annotations
932
+ self.createIndex()
933
+
934
+ def createIndex(self):
935
+ # create index
936
+ print("creating index...")
937
+ anns, cats, imgs = {}, {}, {}
938
+ imgToAnns, catToImgs = defaultdict(list), defaultdict(list)
939
+ if "annotations" in self.dataset:
940
+ for ann in self.dataset["annotations"]:
941
+ imgToAnns[ann["image_id"]].append(ann)
942
+ anns[ann["id"]] = ann
943
+
944
+ if "images" in self.dataset:
945
+ for img in self.dataset["images"]:
946
+ imgs[img["id"]] = img
947
+
948
+ if "categories" in self.dataset:
949
+ for cat in self.dataset["categories"]:
950
+ cats[cat["id"]] = cat
951
+
952
+ if "annotations" in self.dataset and "categories" in self.dataset:
953
+ for ann in self.dataset["annotations"]:
954
+ catToImgs[ann["category_id"]].append(ann["image_id"])
955
+
956
+ print("index created!")
957
+
958
+ # create class members
959
+ self.anns = anns
960
+ self.imgToAnns = imgToAnns
961
+ self.catToImgs = catToImgs
962
+ self.imgs = imgs
963
+ self.cats = cats
964
+
965
+ def info(self):
966
+ """
967
+ Print information about the annotation file.
968
+ :return:
969
+ """
970
+ for key, value in self.dataset["info"].items():
971
+ print("{}: {}".format(key, value))
972
+
973
+ def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
974
+ """
975
+ Get ann ids that satisfy given filter conditions. default skips that filter
976
+ :param imgIds (int array) : get anns for given imgs
977
+ catIds (int array) : get anns for given cats
978
+ areaRng (float array) : get anns for given area range (e.g. [0 inf])
979
+ iscrowd (boolean) : get anns for given crowd label (False or True)
980
+ :return: ids (int array) : integer array of ann ids
981
+ """
982
+ imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
983
+ catIds = catIds if _isArrayLike(catIds) else [catIds]
984
+
985
+ if len(imgIds) == len(catIds) == len(areaRng) == 0:
986
+ anns = self.dataset["annotations"]
987
+ else:
988
+ if not len(imgIds) == 0:
989
+ lists = [
990
+ self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns
991
+ ]
992
+ anns = list(itertools.chain.from_iterable(lists))
993
+ else:
994
+ anns = self.dataset["annotations"]
995
+ anns = (
996
+ anns
997
+ if len(catIds) == 0
998
+ else [ann for ann in anns if ann["category_id"] in catIds]
999
+ )
1000
+ anns = (
1001
+ anns
1002
+ if len(areaRng) == 0
1003
+ else [
1004
+ ann
1005
+ for ann in anns
1006
+ if ann["area"] > areaRng[0] and ann["area"] < areaRng[1]
1007
+ ]
1008
+ )
1009
+ if not iscrowd == None:
1010
+ ids = [ann["id"] for ann in anns if ann["iscrowd"] == iscrowd]
1011
+ else:
1012
+ ids = [ann["id"] for ann in anns]
1013
+ return ids
1014
+
1015
+ def getCatIds(self, catNms=[], supNms=[], catIds=[]):
1016
+ """
1017
+ filtering parameters. default skips that filter.
1018
+ :param catNms (str array) : get cats for given cat names
1019
+ :param supNms (str array) : get cats for given supercategory names
1020
+ :param catIds (int array) : get cats for given cat ids
1021
+ :return: ids (int array) : integer array of cat ids
1022
+ """
1023
+ catNms = catNms if _isArrayLike(catNms) else [catNms]
1024
+ supNms = supNms if _isArrayLike(supNms) else [supNms]
1025
+ catIds = catIds if _isArrayLike(catIds) else [catIds]
1026
+
1027
+ if len(catNms) == len(supNms) == len(catIds) == 0:
1028
+ cats = self.dataset["categories"]
1029
+ else:
1030
+ cats = self.dataset["categories"]
1031
+ cats = (
1032
+ cats
1033
+ if len(catNms) == 0
1034
+ else [cat for cat in cats if cat["name"] in catNms]
1035
+ )
1036
+ cats = (
1037
+ cats
1038
+ if len(supNms) == 0
1039
+ else [cat for cat in cats if cat["supercategory"] in supNms]
1040
+ )
1041
+ cats = (
1042
+ cats
1043
+ if len(catIds) == 0
1044
+ else [cat for cat in cats if cat["id"] in catIds]
1045
+ )
1046
+ ids = [cat["id"] for cat in cats]
1047
+ return ids
1048
+
1049
+ def getImgIds(self, imgIds=[], catIds=[]):
1050
+ """
1051
+ Get img ids that satisfy given filter conditions.
1052
+ :param imgIds (int array) : get imgs for given ids
1053
+ :param catIds (int array) : get imgs with all given cats
1054
+ :return: ids (int array) : integer array of img ids
1055
+ """
1056
+ imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
1057
+ catIds = catIds if _isArrayLike(catIds) else [catIds]
1058
+
1059
+ if len(imgIds) == len(catIds) == 0:
1060
+ ids = self.imgs.keys()
1061
+ else:
1062
+ ids = set(imgIds)
1063
+ for i, catId in enumerate(catIds):
1064
+ if i == 0 and len(ids) == 0:
1065
+ ids = set(self.catToImgs[catId])
1066
+ else:
1067
+ ids &= set(self.catToImgs[catId])
1068
+ return list(ids)
1069
+
1070
+ def loadAnns(self, ids=[]):
1071
+ """
1072
+ Load anns with the specified ids.
1073
+ :param ids (int array) : integer ids specifying anns
1074
+ :return: anns (object array) : loaded ann objects
1075
+ """
1076
+ if _isArrayLike(ids):
1077
+ return [self.anns[id] for id in ids]
1078
+ elif type(ids) == int:
1079
+ return [self.anns[ids]]
1080
+
1081
+ def loadCats(self, ids=[]):
1082
+ """
1083
+ Load cats with the specified ids.
1084
+ :param ids (int array) : integer ids specifying cats
1085
+ :return: cats (object array) : loaded cat objects
1086
+ """
1087
+ if _isArrayLike(ids):
1088
+ return [self.cats[id] for id in ids]
1089
+ elif type(ids) == int:
1090
+ return [self.cats[ids]]
1091
+
1092
+ def loadImgs(self, ids=[]):
1093
+ """
1094
+ Load anns with the specified ids.
1095
+ :param ids (int array) : integer ids specifying img
1096
+ :return: imgs (object array) : loaded img objects
1097
+ """
1098
+ if _isArrayLike(ids):
1099
+ return [self.imgs[id] for id in ids]
1100
+ elif type(ids) == int:
1101
+ return [self.imgs[ids]]
1102
+
1103
+ def showAnns(self, anns, draw_bbox=False):
1104
+ """
1105
+ Display the specified annotations.
1106
+ :param anns (array of object): annotations to display
1107
+ :return: None
1108
+ """
1109
+ if len(anns) == 0:
1110
+ return 0
1111
+ if "segmentation" in anns[0] or "keypoints" in anns[0]:
1112
+ datasetType = "instances"
1113
+ elif "caption" in anns[0]:
1114
+ datasetType = "captions"
1115
+ else:
1116
+ raise Exception("datasetType not supported")
1117
+ if datasetType == "instances":
1118
+ ax = plt.gca()
1119
+ ax.set_autoscale_on(False)
1120
+ polygons = []
1121
+ color = []
1122
+ for ann in anns:
1123
+ c = (np.random.random((1, 3)) * 0.6 + 0.4).tolist()[0]
1124
+ if "segmentation" in ann:
1125
+ if type(ann["segmentation"]) == list:
1126
+ # polygon
1127
+ for seg in ann["segmentation"]:
1128
+ poly = np.array(seg).reshape((int(len(seg) / 2), 2))
1129
+ polygons.append(Polygon(poly))
1130
+ color.append(c)
1131
+ else:
1132
+ raise NotImplementedError(
1133
+ "This type is not is not supported yet."
1134
+ )
1135
+ # # mask
1136
+ # t = self.imgs[ann['image_id']]
1137
+ # if type(ann['segmentation']['counts']) == list:
1138
+ # rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
1139
+ # else:
1140
+ # rle = [ann['segmentation']]
1141
+ # m = maskUtils.decode(rle)
1142
+ # img = np.ones( (m.shape[0], m.shape[1], 3) )
1143
+ # if ann['iscrowd'] == 1:
1144
+ # color_mask = np.array([2.0,166.0,101.0])/255
1145
+ # if ann['iscrowd'] == 0:
1146
+ # color_mask = np.random.random((1, 3)).tolist()[0]
1147
+ # for i in range(3):
1148
+ # img[:,:,i] = color_mask[i]
1149
+ # ax.imshow(np.dstack( (img, m*0.5) ))
1150
+ if "keypoints" in ann and type(ann["keypoints"]) == list:
1151
+ # turn skeleton into zero-based index
1152
+ sks = np.array(self.loadCats(ann["category_id"])[0]["skeleton"]) - 1
1153
+ kp = np.array(ann["keypoints"])
1154
+ x = kp[0::3]
1155
+ y = kp[1::3]
1156
+ v = kp[2::3]
1157
+ for sk in sks:
1158
+ if np.all(v[sk] > 0):
1159
+ plt.plot(x[sk], y[sk], linewidth=3, color=c)
1160
+ plt.plot(
1161
+ x[v > 0],
1162
+ y[v > 0],
1163
+ "o",
1164
+ markersize=8,
1165
+ markerfacecolor=c,
1166
+ markeredgecolor="k",
1167
+ markeredgewidth=2,
1168
+ )
1169
+ plt.plot(
1170
+ x[v > 1],
1171
+ y[v > 1],
1172
+ "o",
1173
+ markersize=8,
1174
+ markerfacecolor=c,
1175
+ markeredgecolor=c,
1176
+ markeredgewidth=2,
1177
+ )
1178
+
1179
+ if draw_bbox:
1180
+ [bbox_x, bbox_y, bbox_w, bbox_h] = ann["bbox"]
1181
+ poly = [
1182
+ [bbox_x, bbox_y],
1183
+ [bbox_x, bbox_y + bbox_h],
1184
+ [bbox_x + bbox_w, bbox_y + bbox_h],
1185
+ [bbox_x + bbox_w, bbox_y],
1186
+ ]
1187
+ np_poly = np.array(poly).reshape((4, 2))
1188
+ polygons.append(Polygon(np_poly))
1189
+ color.append(c)
1190
+
1191
+ p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
1192
+ ax.add_collection(p)
1193
+ p = PatchCollection(
1194
+ polygons, facecolor="none", edgecolors=color, linewidths=2
1195
+ )
1196
+ ax.add_collection(p)
1197
+ elif datasetType == "captions":
1198
+ for ann in anns:
1199
+ print(ann["caption"])
1200
+
1201
+ def loadRes(self, resFile):
1202
+ """
1203
+ Load result file and return a result api object.
1204
+ :param resFile (str) : file name of result file
1205
+ :return: res (obj) : result api object
1206
+ """
1207
+ res = COCO()
1208
+ res.dataset["images"] = [img for img in self.dataset["images"]]
1209
+
1210
+ print("Loading and preparing results...")
1211
+ tic = time.time()
1212
+ if type(resFile) == str or (PYTHON_VERSION == 2 and type(resFile) == unicode):
1213
+ anns = json.load(open(resFile))
1214
+ elif type(resFile) == np.ndarray:
1215
+ anns = self.loadNumpyAnnotations(resFile)
1216
+ else:
1217
+ anns = resFile
1218
+ assert type(anns) == list, "results in not an array of objects"
1219
+ annsImgIds = [ann["image_id"] for ann in anns]
1220
+ assert set(annsImgIds) == (
1221
+ set(annsImgIds) & set(self.getImgIds())
1222
+ ), "Results do not correspond to current coco set"
1223
+ if "caption" in anns[0]:
1224
+ raise NotImplementedError("Evaluating caption is not supported yet.")
1225
+ elif "bbox" in anns[0] and not anns[0]["bbox"] == []:
1226
+ res.dataset["categories"] = copy.deepcopy(self.dataset["categories"])
1227
+ for id, ann in enumerate(anns):
1228
+ bb = ann["bbox"]
1229
+ x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
1230
+ if not "segmentation" in ann:
1231
+ ann["segmentation"] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
1232
+ ann["area"] = bb[2] * bb[3]
1233
+ ann["id"] = id + 1
1234
+ ann["iscrowd"] = 0
1235
+ elif "segmentation" in anns[0]:
1236
+ raise NotImplementedError("Evaluating caption is not supported yet.")
1237
+ elif "keypoints" in anns[0]:
1238
+ raise NotImplementedError("Evaluating caption is not supported yet.")
1239
+ print("DONE (t={:0.2f}s)".format(time.time() - tic))
1240
+
1241
+ res.dataset["annotations"] = anns
1242
+ res.createIndex()
1243
+ return res
1244
+
1245
+ def download(self, tarDir=None, imgIds=[]):
1246
+ """
1247
+ Download COCO images from mscoco.org server.
1248
+ :param tarDir (str): COCO results directory name
1249
+ imgIds (list): images to be downloaded
1250
+ :return:
1251
+ """
1252
+ if tarDir is None:
1253
+ print("Please specify target directory")
1254
+ return -1
1255
+ if len(imgIds) == 0:
1256
+ imgs = self.imgs.values()
1257
+ else:
1258
+ imgs = self.loadImgs(imgIds)
1259
+ N = len(imgs)
1260
+ if not os.path.exists(tarDir):
1261
+ os.makedirs(tarDir)
1262
+ for i, img in enumerate(imgs):
1263
+ tic = time.time()
1264
+ fname = os.path.join(tarDir, img["file_name"])
1265
+ if not os.path.exists(fname):
1266
+ urlretrieve(img["coco_url"], fname)
1267
+ print(
1268
+ "downloaded {}/{} images (t={:0.1f}s)".format(i, N, time.time() - tic)
1269
+ )
1270
+
1271
+ def loadNumpyAnnotations(self, data):
1272
+ """
1273
+ Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
1274
+ :param data (numpy.ndarray)
1275
+ :return: annotations (python nested list)
1276
+ """
1277
+ print("Converting ndarray to lists...")
1278
+ assert type(data) == np.ndarray
1279
+ print(data.shape)
1280
+ assert data.shape[1] == 7
1281
+ N = data.shape[0]
1282
+ ann = []
1283
+ for i in range(N):
1284
+ if i % 1000000 == 0:
1285
+ print("{}/{}".format(i, N))
1286
+ ann += [
1287
+ {
1288
+ "image_id": int(data[i, 0]),
1289
+ "bbox": [data[i, 1], data[i, 2], data[i, 3], data[i, 4]],
1290
+ "score": data[i, 5],
1291
+ "category_id": int(data[i, 6]),
1292
+ }
1293
+ ]
1294
+ return ann
1295
+
1296
+ def annToRLE(self, ann):
1297
+ """
1298
+ Convert annotation which can be polygons, uncompressed RLE to RLE.
1299
+ :return: binary mask (numpy 2D array)
1300
+ """
1301
+ t = self.imgs[ann["image_id"]]
1302
+ h, w = t["height"], t["width"]
1303
+ segm = ann["segmentation"]
1304
+ if type(segm) == list:
1305
+ raise NotImplementedError("This type is not is not supported yet.")
1306
+ # polygon -- a single object might consist of multiple parts
1307
+ # we merge all parts into one mask rle code
1308
+ # rles = maskUtils.frPyObjects(segm, h, w)
1309
+ # rle = maskUtils.merge(rles)
1310
+ elif type(segm["counts"]) == list:
1311
+ raise NotImplementedError("This type is not is not supported yet.")
1312
+ # uncompressed RLE
1313
+ # rle = maskUtils.frPyObjects(segm, h, w)
1314
+ else:
1315
+ # rle
1316
+ rle = ann["segmentation"]
1317
+ return rle
1318
+
1319
+ def annToMask(self, ann):
1320
+ """
1321
+ Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
1322
+ :return: binary mask (numpy 2D array)
1323
+ """
1324
+ rle = self.annToRLE(ann)
1325
+ # m = maskUtils.decode(rle)
1326
+ raise NotImplementedError("This type is not is not supported yet.")
1327
+ return m
1328
+
1329
+
1330
+ # Typings
1331
+ _TYPING_BOX = Tuple[float, float, float, float]
1332
+ _TYPING_SCORES = List[float]
1333
+ _TYPING_LABELS = List[int]
1334
+ _TYPING_BOXES = List[_TYPING_BOX]
1335
+ _TYPING_PRED_REF = Union[_TYPING_SCORES, _TYPING_LABELS, _TYPING_BOXES]
1336
+ _TYPING_PREDICTION = Dict[str, _TYPING_PRED_REF]
1337
+ _TYPING_REFERENCE = Dict[str, _TYPING_PRED_REF]
1338
+ _TYPING_PREDICTIONS = Dict[int, _TYPING_PREDICTION]
1339
+
1340
+
1341
+ def convert_to_xywh(boxes: torch.Tensor) -> torch.Tensor:
1342
+ """
1343
+ Convert bounding boxes from (xmin, ymin, xmax, ymax) format to (x, y, width, height) format.
1344
+
1345
+ Args:
1346
+ boxes (torch.Tensor): Tensor of shape (N, 4) representing bounding boxes in \
1347
+ (xmin, ymin, xmax, ymax) format.
1348
+
1349
+ Returns:
1350
+ torch.Tensor: Tensor of shape (N, 4) representing bounding boxes in (x, y, width, height) \
1351
+ format.
1352
+ """
1353
+ xmin, ymin, xmax, ymax = boxes.unbind(1)
1354
+ return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1)
1355
+
1356
+
1357
+ def create_common_coco_eval(
1358
+ coco_eval: COCOeval, img_ids: List[int], eval_imgs: np.ndarray
1359
+ ) -> None:
1360
+ """
1361
+ Create a common COCO evaluation by merging image IDs and evaluation images into the \
1362
+ coco_eval object.
1363
+
1364
+ Args:
1365
+ coco_eval: COCOeval evaluation object.
1366
+ img_ids (List[int]): Tensor of image IDs.
1367
+ eval_imgs (torch.Tensor): Tensor of evaluation images.
1368
+ """
1369
+ img_ids, eval_imgs = merge(img_ids, eval_imgs)
1370
+ img_ids = list(img_ids)
1371
+ eval_imgs = list(eval_imgs.flatten())
1372
+
1373
+ coco_eval.evalImgs = eval_imgs
1374
+ coco_eval.params.imgIds = img_ids
1375
+ coco_eval._paramsEval = copy.deepcopy(coco_eval.params)
1376
+
1377
+
1378
+ def merge(img_ids: List[int], eval_imgs: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
1379
+ """
1380
+ Merge image IDs and evaluation images from different processes.
1381
+
1382
+ Args:
1383
+ img_ids (List[int]): List of image ID arrays from different processes.
1384
+ eval_imgs (np.ndarray): Evaluation images from different processes.
1385
+
1386
+ Returns:
1387
+ Tuple[np.ndarray, np.ndarray]: Merged image IDs and evaluation images.
1388
+ """
1389
+ all_img_ids = all_gather(img_ids)
1390
+ all_eval_imgs = all_gather(eval_imgs)
1391
+
1392
+ merged_img_ids = []
1393
+ for p in all_img_ids:
1394
+ merged_img_ids.extend(p)
1395
+
1396
+ merged_eval_imgs = []
1397
+ for p in all_eval_imgs:
1398
+ merged_eval_imgs.append(p)
1399
+
1400
+ merged_img_ids = np.array(merged_img_ids)
1401
+ merged_eval_imgs = np.concatenate(merged_eval_imgs, 2)
1402
+
1403
+ # keep only unique (and in sorted order) images
1404
+ merged_img_ids, idx = np.unique(merged_img_ids, return_index=True)
1405
+ merged_eval_imgs = merged_eval_imgs[..., idx]
1406
+
1407
+ return merged_img_ids, merged_eval_imgs
1408
+
1409
+
1410
+ def all_gather(data: List[int]) -> List[List[int]]:
1411
+ """
1412
+ Run all_gather on arbitrary picklable data (not necessarily tensors).
1413
+
1414
+ Args:
1415
+ data (List[int]): any picklable object
1416
+ Returns:
1417
+ List[List[int]]: list of data gathered from each rank
1418
+ """
1419
+ world_size = get_world_size()
1420
+ if world_size == 1:
1421
+ return [data]
1422
+
1423
+ # serialized to a Tensor
1424
+ buffer = pickle.dumps(data)
1425
+ storage = torch.ByteStorage.from_buffer(buffer)
1426
+ tensor = torch.ByteTensor(storage).to("cuda")
1427
+
1428
+ # obtain Tensor size of each rank
1429
+ local_size = torch.tensor([tensor.numel()], device="cuda")
1430
+ size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
1431
+ dist.all_gather(size_list, local_size)
1432
+ size_list = [int(size.item()) for size in size_list]
1433
+ max_size = max(size_list)
1434
+
1435
+ # receiving Tensor from all ranks
1436
+ # we pad the tensor because torch all_gather does not support
1437
+ # gathering tensors of different shapes
1438
+ tensor_list = []
1439
+ for _ in size_list:
1440
+ tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
1441
+ if local_size != max_size:
1442
+ padding = torch.empty(
1443
+ size=(max_size - local_size,), dtype=torch.uint8, device="cuda"
1444
+ )
1445
+ tensor = torch.cat((tensor, padding), dim=0)
1446
+ dist.all_gather(tensor_list, tensor)
1447
+
1448
+ data_list = []
1449
+ for size, tensor in zip(size_list, tensor_list):
1450
+ buffer = tensor.cpu().numpy().tobytes()[:size]
1451
+ data_list.append(pickle.loads(buffer))
1452
+
1453
+ return data_list
1454
+
1455
+
1456
+ def get_world_size() -> int:
1457
+ """
1458
+ Get the number of processes in the distributed environment.
1459
+
1460
+ Returns:
1461
+ int: Number of processes.
1462
+ """
1463
+ if not is_dist_avail_and_initialized():
1464
+ return 1
1465
+ return dist.get_world_size()
1466
+
1467
+
1468
+ def is_dist_avail_and_initialized() -> bool:
1469
+ """
1470
+ Check if distributed environment is available and initialized.
1471
+
1472
+ Returns:
1473
+ bool: True if distributed environment is available and initialized, False otherwise.
1474
+ """
1475
+ return dist.is_available() and dist.is_initialized()
1476
+
1477
+
1478
+ import contextlib
1479
+ import copy
1480
+ import os
1481
+ from typing import Dict, List, Union
1482
+
1483
+ import numpy as np
1484
+ import torch
1485
+
1486
+ from detection_metrics.pycocotools.coco import COCO
1487
+ from detection_metrics.pycocotools.cocoeval import COCOeval
1488
+ from detection_metrics.utils import (_TYPING_BOX, _TYPING_PREDICTIONS, convert_to_xywh,
1489
+ create_common_coco_eval)
1490
+
1491
+ _SUPPORTED_TYPES = ["bbox"]
1492
+
1493
+
1494
+ class COCOEvaluator(object):
1495
+ """
1496
+ Class to perform evaluation for the COCO dataset.
1497
+ """
1498
+
1499
+ def __init__(self, coco_gt: COCO, iou_types: List[str] = ["bbox"]):
1500
+ """
1501
+ Initializes COCOEvaluator with the ground truth COCO dataset and IoU types.
1502
+
1503
+ Args:
1504
+ coco_gt: The ground truth COCO dataset.
1505
+ iou_types: Intersection over Union (IoU) types for evaluation (Supported: "bbox").
1506
+ """
1507
+ self.coco_gt = copy.deepcopy(coco_gt)
1508
+
1509
+ self.coco_eval = {}
1510
+ for iou_type in iou_types:
1511
+ assert iou_type in _SUPPORTED_TYPES, ValueError(
1512
+ f"IoU type not supported {iou_type}"
1513
+ )
1514
+ self.coco_eval[iou_type] = COCOeval(self.coco_gt, iouType=iou_type)
1515
+
1516
+ self.iou_types = iou_types
1517
+ self.img_ids = []
1518
+ self.eval_imgs = {k: [] for k in iou_types}
1519
+
1520
+ def update(self, predictions: _TYPING_PREDICTIONS) -> None:
1521
+ """
1522
+ Update the evaluator with new predictions.
1523
+
1524
+ Args:
1525
+ predictions: The predictions to update.
1526
+ """
1527
+ img_ids = list(np.unique(list(predictions.keys())))
1528
+ self.img_ids.extend(img_ids)
1529
+
1530
+ for iou_type in self.iou_types:
1531
+ results = self.prepare(predictions, iou_type)
1532
+
1533
+ # suppress pycocotools prints
1534
+ with open(os.devnull, "w") as devnull:
1535
+ with contextlib.redirect_stdout(devnull):
1536
+ coco_dt = COCO.loadRes(self.coco_gt, results) if results else COCO()
1537
+ coco_eval = self.coco_eval[iou_type]
1538
+
1539
+ coco_eval.cocoDt = coco_dt
1540
+ coco_eval.params.imgIds = list(img_ids)
1541
+ eval_imgs = coco_eval.evaluate()
1542
+ self.eval_imgs[iou_type].append(eval_imgs)
1543
+
1544
+ def synchronize_between_processes(self) -> None:
1545
+ """
1546
+ Synchronizes evaluation images between processes.
1547
+ """
1548
+ for iou_type in self.iou_types:
1549
+ self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2)
1550
+ create_common_coco_eval(
1551
+ self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type]
1552
+ )
1553
+
1554
+ def accumulate(self) -> None:
1555
+ """
1556
+ Accumulates the evaluation results.
1557
+ """
1558
+ for coco_eval in self.coco_eval.values():
1559
+ coco_eval.accumulate()
1560
+
1561
+ def summarize(self) -> None:
1562
+ """
1563
+ Prints the IoU metric and summarizes the evaluation results.
1564
+ """
1565
+ for iou_type, coco_eval in self.coco_eval.items():
1566
+ print("IoU metric: {}".format(iou_type))
1567
+ coco_eval.summarize()
1568
+
1569
+ def prepare(
1570
+ self, predictions: _TYPING_PREDICTIONS, iou_type: str
1571
+ ) -> List[Dict[str, Union[int, _TYPING_BOX, float]]]:
1572
+ """
1573
+ Prepares the predictions for COCO detection.
1574
+
1575
+ Args:
1576
+ predictions: The predictions to prepare.
1577
+ iou_type: The Intersection over Union (IoU) type for evaluation.
1578
+
1579
+ Returns:
1580
+ A dictionary with the prepared predictions.
1581
+ """
1582
+ if iou_type == "bbox":
1583
+ return self.prepare_for_coco_detection(predictions)
1584
+ else:
1585
+ raise ValueError(f"IoU type not supported {iou_type}")
1586
+
1587
+ def _post_process_stats(
1588
+ self, stats, coco_eval_object, iou_type="bbox"
1589
+ ) -> Dict[str, float]:
1590
+ """
1591
+ Prepares the predictions for COCO detection.
1592
+
1593
+ Args:
1594
+ predictions: The predictions to prepare.
1595
+ iou_type: The Intersection over Union (IoU) type for evaluation.
1596
+
1597
+ Returns:
1598
+ A dictionary with the prepared predictions.
1599
+ """
1600
+ if iou_type not in _SUPPORTED_TYPES:
1601
+ raise ValueError(f"iou_type '{iou_type}' not supported")
1602
+
1603
+ current_max_dets = coco_eval_object.params.maxDets
1604
+
1605
+ index_to_title = {
1606
+ "bbox": {
1607
+ 0: f"AP-IoU=0.50:0.95-area=all-maxDets={current_max_dets[2]}",
1608
+ 1: f"AP-IoU=0.50-area=all-maxDets={current_max_dets[2]}",
1609
+ 2: f"AP-IoU=0.75-area=all-maxDets={current_max_dets[2]}",
1610
+ 3: f"AP-IoU=0.50:0.95-area=small-maxDets={current_max_dets[2]}",
1611
+ 4: f"AP-IoU=0.50:0.95-area=medium-maxDets={current_max_dets[2]}",
1612
+ 5: f"AP-IoU=0.50:0.95-area=large-maxDets={current_max_dets[2]}",
1613
+ 6: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[0]}",
1614
+ 7: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[1]}",
1615
+ 8: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[2]}",
1616
+ 9: f"AR-IoU=0.50:0.95-area=small-maxDets={current_max_dets[2]}",
1617
+ 10: f"AR-IoU=0.50:0.95-area=medium-maxDets={current_max_dets[2]}",
1618
+ 11: f"AR-IoU=0.50:0.95-area=large-maxDets={current_max_dets[2]}",
1619
+ },
1620
+ "keypoints": {
1621
+ 0: "AP-IoU=0.50:0.95-area=all-maxDets=20",
1622
+ 1: "AP-IoU=0.50-area=all-maxDets=20",
1623
+ 2: "AP-IoU=0.75-area=all-maxDets=20",
1624
+ 3: "AP-IoU=0.50:0.95-area=medium-maxDets=20",
1625
+ 4: "AP-IoU=0.50:0.95-area=large-maxDets=20",
1626
+ 5: "AR-IoU=0.50:0.95-area=all-maxDets=20",
1627
+ 6: "AR-IoU=0.50-area=all-maxDets=20",
1628
+ 7: "AR-IoU=0.75-area=all-maxDets=20",
1629
+ 8: "AR-IoU=0.50:0.95-area=medium-maxDets=20",
1630
+ 9: "AR-IoU=0.50:0.95-area=large-maxDets=20",
1631
+ },
1632
+ }
1633
+
1634
+ output_dict: Dict[str, float] = {}
1635
+ for index, stat in enumerate(stats):
1636
+ output_dict[index_to_title[iou_type][index]] = stat
1637
+
1638
+ return output_dict
1639
+
1640
+ def get_results(self) -> Dict[str, Dict[str, float]]:
1641
+ """
1642
+ Gets the results of the COCO evaluation.
1643
+
1644
+ Returns:
1645
+ A dictionary with the results of the COCO evaluation.
1646
+ """
1647
+ output_dict = {}
1648
+
1649
+ for iou_type, coco_eval in self.coco_eval.items():
1650
+ if iou_type == "segm":
1651
+ iou_type = "bbox"
1652
+ output_dict[f"iou_{iou_type}"] = self._post_process_stats(
1653
+ coco_eval.stats, coco_eval, iou_type
1654
+ )
1655
+ return output_dict
1656
+
1657
+ def prepare_for_coco_detection(
1658
+ self, predictions: _TYPING_PREDICTIONS
1659
+ ) -> List[Dict[str, Union[int, _TYPING_BOX, float]]]:
1660
+ """
1661
+ Prepares the predictions for COCO detection.
1662
+
1663
+ Args:
1664
+ predictions: The predictions to prepare.
1665
+
1666
+ Returns:
1667
+ A list of dictionaries with the prepared predictions.
1668
+ """
1669
+ coco_results = []
1670
+ for original_id, prediction in predictions.items():
1671
+ if len(prediction) == 0:
1672
+ continue
1673
+
1674
+ boxes = prediction["boxes"]
1675
+ if len(boxes) == 0:
1676
+ continue
1677
+
1678
+ if not isinstance(boxes, torch.Tensor):
1679
+ boxes = torch.as_tensor(boxes)
1680
+ boxes = boxes.tolist()
1681
+
1682
+ scores = prediction["scores"]
1683
+ if not isinstance(scores, list):
1684
+ scores = scores.tolist()
1685
+
1686
+ labels = prediction["labels"]
1687
+ if not isinstance(labels, list):
1688
+ labels = prediction["labels"].tolist()
1689
+
1690
+ coco_results.extend(
1691
+ [
1692
+ {
1693
+ "image_id": original_id,
1694
+ "category_id": labels[k],
1695
+ "bbox": box,
1696
+ "score": scores[k],
1697
+ }
1698
+ for k, box in enumerate(boxes)
1699
+ ]
1700
+ )
1701
+ return coco_results
1702
+
1703
 
1704
  _DESCRIPTION = "This class evaluates object detection models using the COCO dataset \
1705
  and its evaluation metrics."
 
1729
  **kwargs: Additional keyword arguments forwarded to evaluate.Metrics.
1730
  """
1731
 
1732
+
1733
  class EvaluateObjectDetection(evaluate.Metric):
1734
  """
1735
  Class for evaluating object detection models.
 
1767
  self.coco_evaluator.coco_gt.cats = cats
1768
  self.coco_evaluator.coco_gt.dataset["categories"] = list(cats.values())
1769
  self.coco_evaluator.coco_eval["bbox"].params.catIds = [c["id"] for c in cats.values()]
1770
+
1771
  def _info(self):
1772
  """
1773
  Returns the MetricInfo object with information about the module.
 
1810
  )
1811
 
1812
  def _preprocess(
1813
+ self, predictions: List[Dict[str, torch.Tensor]]
1814
  ) -> List[_TYPING_PREDICTION]:
1815
  """
1816
  Preprocesses the predictions before computing the scores.
 
1840
  for prediction in predictions:
1841
  ret.append({k: v for k, v in prediction.items() if k in required})
1842
  return ret
1843
+
1844
  def _clear_references(self, references):
1845
  required = [""]
1846
  ret = []
1847
  for ref in references:
1848
  ret.append({k: v for k, v in ref.items() if k in required})
1849
  return ret
1850
+
1851
+ def add(self, *, prediction=None, reference=None, **kwargs):
1852
  """
1853
  Preprocesses the predictions and references and calls the parent class function.
1854
 
 
1860
  if prediction is not None:
1861
  prediction = self._clear_predictions(prediction)
1862
  prediction = self._preprocess(prediction)
1863
+
1864
  res = {} # {image_id} : prediction
1865
  for output, target in zip(prediction, reference):
1866
  res[target["image_id"][0]] = output
 
1869
  super(evaluate.Metric, self).add(prediction=prediction, references=reference, **kwargs)
1870
 
1871
  def _compute(
1872
+ self,
1873
+ predictions: List[List[_TYPING_PREDICTION]],
1874
+ references: List[List[_TYPING_REFERENCE]],
1875
  ) -> Dict[str, Dict[str, float]]:
1876
  """
1877
  Returns the evaluation scores.
 
1885
  """
1886
  print("Synchronizing processes")
1887
  self.coco_evaluator.synchronize_between_processes()
1888
+
1889
  print("Accumulating values")
1890
  self.coco_evaluator.accumulate()
1891
+
1892
  print("Summarizing results")
1893
  self.coco_evaluator.summarize()
1894
+
1895
  stats = self.coco_evaluator.get_results()
1896
  return stats
detection_metrics/__init__.py DELETED
@@ -1 +0,0 @@
1
- __version__ = "0.0.3"
 
 
detection_metrics/coco_evaluate.py DELETED
@@ -1,225 +0,0 @@
1
- import contextlib
2
- import copy
3
- import os
4
- from typing import Dict, List, Union
5
-
6
- import numpy as np
7
- import torch
8
-
9
- from detection_metrics.pycocotools.coco import COCO
10
- from detection_metrics.pycocotools.cocoeval import COCOeval
11
- from detection_metrics.utils import (_TYPING_BOX, _TYPING_PREDICTIONS, convert_to_xywh,
12
- create_common_coco_eval)
13
-
14
- _SUPPORTED_TYPES = ["bbox"]
15
-
16
-
17
- class COCOEvaluator(object):
18
- """
19
- Class to perform evaluation for the COCO dataset.
20
- """
21
-
22
- def __init__(self, coco_gt: COCO, iou_types: List[str] = ["bbox"]):
23
- """
24
- Initializes COCOEvaluator with the ground truth COCO dataset and IoU types.
25
-
26
- Args:
27
- coco_gt: The ground truth COCO dataset.
28
- iou_types: Intersection over Union (IoU) types for evaluation (Supported: "bbox").
29
- """
30
- self.coco_gt = copy.deepcopy(coco_gt)
31
-
32
- self.coco_eval = {}
33
- for iou_type in iou_types:
34
- assert iou_type in _SUPPORTED_TYPES, ValueError(
35
- f"IoU type not supported {iou_type}"
36
- )
37
- self.coco_eval[iou_type] = COCOeval(self.coco_gt, iouType=iou_type)
38
-
39
- self.iou_types = iou_types
40
- self.img_ids = []
41
- self.eval_imgs = {k: [] for k in iou_types}
42
-
43
- def update(self, predictions: _TYPING_PREDICTIONS) -> None:
44
- """
45
- Update the evaluator with new predictions.
46
-
47
- Args:
48
- predictions: The predictions to update.
49
- """
50
- img_ids = list(np.unique(list(predictions.keys())))
51
- self.img_ids.extend(img_ids)
52
-
53
- for iou_type in self.iou_types:
54
- results = self.prepare(predictions, iou_type)
55
-
56
- # suppress pycocotools prints
57
- with open(os.devnull, "w") as devnull:
58
- with contextlib.redirect_stdout(devnull):
59
- coco_dt = COCO.loadRes(self.coco_gt, results) if results else COCO()
60
- coco_eval = self.coco_eval[iou_type]
61
-
62
- coco_eval.cocoDt = coco_dt
63
- coco_eval.params.imgIds = list(img_ids)
64
- eval_imgs = coco_eval.evaluate()
65
- self.eval_imgs[iou_type].append(eval_imgs)
66
-
67
-
68
- def synchronize_between_processes(self) -> None:
69
- """
70
- Synchronizes evaluation images between processes.
71
- """
72
- for iou_type in self.iou_types:
73
- self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2)
74
- create_common_coco_eval(
75
- self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type]
76
- )
77
-
78
- def accumulate(self) -> None:
79
- """
80
- Accumulates the evaluation results.
81
- """
82
- for coco_eval in self.coco_eval.values():
83
- coco_eval.accumulate()
84
-
85
- def summarize(self) -> None:
86
- """
87
- Prints the IoU metric and summarizes the evaluation results.
88
- """
89
- for iou_type, coco_eval in self.coco_eval.items():
90
- print("IoU metric: {}".format(iou_type))
91
- coco_eval.summarize()
92
-
93
- def prepare(
94
- self, predictions: _TYPING_PREDICTIONS, iou_type: str
95
- ) -> List[Dict[str, Union[int, _TYPING_BOX, float]]]:
96
- """
97
- Prepares the predictions for COCO detection.
98
-
99
- Args:
100
- predictions: The predictions to prepare.
101
- iou_type: The Intersection over Union (IoU) type for evaluation.
102
-
103
- Returns:
104
- A dictionary with the prepared predictions.
105
- """
106
- if iou_type == "bbox":
107
- return self.prepare_for_coco_detection(predictions)
108
- else:
109
- raise ValueError(f"IoU type not supported {iou_type}")
110
-
111
- def _post_process_stats(
112
- self, stats, coco_eval_object, iou_type="bbox"
113
- ) -> Dict[str, float]:
114
- """
115
- Prepares the predictions for COCO detection.
116
-
117
- Args:
118
- predictions: The predictions to prepare.
119
- iou_type: The Intersection over Union (IoU) type for evaluation.
120
-
121
- Returns:
122
- A dictionary with the prepared predictions.
123
- """
124
- if iou_type not in _SUPPORTED_TYPES:
125
- raise ValueError(f"iou_type '{iou_type}' not supported")
126
-
127
- current_max_dets = coco_eval_object.params.maxDets
128
-
129
- index_to_title = {
130
- "bbox": {
131
- 0: f"AP-IoU=0.50:0.95-area=all-maxDets={current_max_dets[2]}",
132
- 1: f"AP-IoU=0.50-area=all-maxDets={current_max_dets[2]}",
133
- 2: f"AP-IoU=0.75-area=all-maxDets={current_max_dets[2]}",
134
- 3: f"AP-IoU=0.50:0.95-area=small-maxDets={current_max_dets[2]}",
135
- 4: f"AP-IoU=0.50:0.95-area=medium-maxDets={current_max_dets[2]}",
136
- 5: f"AP-IoU=0.50:0.95-area=large-maxDets={current_max_dets[2]}",
137
- 6: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[0]}",
138
- 7: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[1]}",
139
- 8: f"AR-IoU=0.50:0.95-area=all-maxDets={current_max_dets[2]}",
140
- 9: f"AR-IoU=0.50:0.95-area=small-maxDets={current_max_dets[2]}",
141
- 10: f"AR-IoU=0.50:0.95-area=medium-maxDets={current_max_dets[2]}",
142
- 11: f"AR-IoU=0.50:0.95-area=large-maxDets={current_max_dets[2]}",
143
- },
144
- "keypoints": {
145
- 0: "AP-IoU=0.50:0.95-area=all-maxDets=20",
146
- 1: "AP-IoU=0.50-area=all-maxDets=20",
147
- 2: "AP-IoU=0.75-area=all-maxDets=20",
148
- 3: "AP-IoU=0.50:0.95-area=medium-maxDets=20",
149
- 4: "AP-IoU=0.50:0.95-area=large-maxDets=20",
150
- 5: "AR-IoU=0.50:0.95-area=all-maxDets=20",
151
- 6: "AR-IoU=0.50-area=all-maxDets=20",
152
- 7: "AR-IoU=0.75-area=all-maxDets=20",
153
- 8: "AR-IoU=0.50:0.95-area=medium-maxDets=20",
154
- 9: "AR-IoU=0.50:0.95-area=large-maxDets=20",
155
- },
156
- }
157
-
158
- output_dict: Dict[str, float] = {}
159
- for index, stat in enumerate(stats):
160
- output_dict[index_to_title[iou_type][index]] = stat
161
-
162
- return output_dict
163
-
164
- def get_results(self) -> Dict[str, Dict[str, float]]:
165
- """
166
- Gets the results of the COCO evaluation.
167
-
168
- Returns:
169
- A dictionary with the results of the COCO evaluation.
170
- """
171
- output_dict = {}
172
-
173
- for iou_type, coco_eval in self.coco_eval.items():
174
- if iou_type == "segm":
175
- iou_type = "bbox"
176
- output_dict[f"iou_{iou_type}"] = self._post_process_stats(
177
- coco_eval.stats, coco_eval, iou_type
178
- )
179
- return output_dict
180
-
181
- def prepare_for_coco_detection(
182
- self, predictions: _TYPING_PREDICTIONS
183
- ) -> List[Dict[str, Union[int, _TYPING_BOX, float]]]:
184
- """
185
- Prepares the predictions for COCO detection.
186
-
187
- Args:
188
- predictions: The predictions to prepare.
189
-
190
- Returns:
191
- A list of dictionaries with the prepared predictions.
192
- """
193
- coco_results = []
194
- for original_id, prediction in predictions.items():
195
- if len(prediction) == 0:
196
- continue
197
-
198
- boxes = prediction["boxes"]
199
- if len(boxes) == 0:
200
- continue
201
-
202
- if not isinstance(boxes, torch.Tensor):
203
- boxes = torch.as_tensor(boxes)
204
- boxes = boxes.tolist()
205
-
206
- scores = prediction["scores"]
207
- if not isinstance(scores, list):
208
- scores = scores.tolist()
209
-
210
- labels = prediction["labels"]
211
- if not isinstance(labels, list):
212
- labels = prediction["labels"].tolist()
213
-
214
- coco_results.extend(
215
- [
216
- {
217
- "image_id": original_id,
218
- "category_id": labels[k],
219
- "bbox": box,
220
- "score": scores[k],
221
- }
222
- for k, box in enumerate(boxes)
223
- ]
224
- )
225
- return coco_results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection_metrics/pycocotools/coco.py DELETED
@@ -1,491 +0,0 @@
1
- # This code is basically a copy and paste from the original cocoapi file:
2
- # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/coco.py
3
- # with the following changes:
4
- # * Instead of receiving the path to the annotation file, it receives a json object.
5
- # * Commented out all parts of code that depends on maskUtils, which is not needed
6
- # for bounding box evaluation.
7
-
8
- __author__ = "tylin"
9
- __version__ = "2.0"
10
- # Interface for accessing the Microsoft COCO dataset.
11
-
12
- # Microsoft COCO is a large image dataset designed for object detection,
13
- # segmentation, and caption generation. pycocotools is a Python API that
14
- # assists in loading, parsing and visualizing the annotations in COCO.
15
- # Please visit http://mscoco.org/ for more information on COCO, including
16
- # for the data, paper, and tutorials. The exact format of the annotations
17
- # is also described on the COCO website. For example usage of the pycocotools
18
- # please see pycocotools_demo.ipynb. In addition to this API, please download both
19
- # the COCO images and annotations in order to run the demo.
20
-
21
- # An alternative to using the API is to load the annotations directly
22
- # into Python dictionary
23
- # Using the API provides additional utility functions. Note that this API
24
- # supports both *instance* and *caption* annotations. In the case of
25
- # captions not all functions are defined (e.g. categories are undefined).
26
-
27
- # The following API functions are defined:
28
- # COCO - COCO api class that loads COCO annotation file and prepare data structures.
29
- # decodeMask - Decode binary mask M encoded via run-length encoding.
30
- # encodeMask - Encode binary mask M using run-length encoding.
31
- # getAnnIds - Get ann ids that satisfy given filter conditions.
32
- # getCatIds - Get cat ids that satisfy given filter conditions.
33
- # getImgIds - Get img ids that satisfy given filter conditions.
34
- # loadAnns - Load anns with the specified ids.
35
- # loadCats - Load cats with the specified ids.
36
- # loadImgs - Load imgs with the specified ids.
37
- # annToMask - Convert segmentation in an annotation to binary mask.
38
- # showAnns - Display the specified annotations.
39
- # loadRes - Load algorithm results and create API for accessing them.
40
- # download - Download COCO images from mscoco.org server.
41
- # Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
42
- # Help on each functions can be accessed by: "help COCO>function".
43
-
44
- # See also COCO>decodeMask,
45
- # COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
46
- # COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
47
- # COCO>loadImgs, COCO>annToMask, COCO>showAnns
48
-
49
- # Microsoft COCO Toolbox. version 2.0
50
- # Data, paper, and tutorials available at: http://mscoco.org/
51
- # Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
52
- # Licensed under the Simplified BSD License [see bsd.txt]
53
-
54
- import copy
55
- import itertools
56
- import json
57
- # from . import mask as maskUtils
58
- import os
59
- import sys
60
- import time
61
- from collections import defaultdict
62
-
63
- import matplotlib.pyplot as plt
64
- import numpy as np
65
- from matplotlib.collections import PatchCollection
66
- from matplotlib.patches import Polygon
67
-
68
- PYTHON_VERSION = sys.version_info[0]
69
- if PYTHON_VERSION == 2:
70
- from urllib import urlretrieve
71
- elif PYTHON_VERSION == 3:
72
- from urllib.request import urlretrieve
73
-
74
-
75
- def _isArrayLike(obj):
76
- return hasattr(obj, "__iter__") and hasattr(obj, "__len__")
77
-
78
-
79
- class COCO:
80
- def __init__(self, annotations=None):
81
- """
82
- Constructor of Microsoft COCO helper class for reading and visualizing annotations.
83
- :param annotation_file (str): location of annotation file
84
- :param image_folder (str): location to the folder that hosts images.
85
- :return:
86
- """
87
- # load dataset
88
- self.dataset, self.anns, self.cats, self.imgs = dict(), dict(), dict(), dict()
89
- self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
90
- # Modified the original code to receive a json object instead of a path to a file
91
- if annotations:
92
- assert (
93
- type(annotations) == dict
94
- ), f"annotation file format {type(annotations)} not supported."
95
- self.dataset = annotations
96
- self.createIndex()
97
-
98
- def createIndex(self):
99
- # create index
100
- print("creating index...")
101
- anns, cats, imgs = {}, {}, {}
102
- imgToAnns, catToImgs = defaultdict(list), defaultdict(list)
103
- if "annotations" in self.dataset:
104
- for ann in self.dataset["annotations"]:
105
- imgToAnns[ann["image_id"]].append(ann)
106
- anns[ann["id"]] = ann
107
-
108
- if "images" in self.dataset:
109
- for img in self.dataset["images"]:
110
- imgs[img["id"]] = img
111
-
112
- if "categories" in self.dataset:
113
- for cat in self.dataset["categories"]:
114
- cats[cat["id"]] = cat
115
-
116
- if "annotations" in self.dataset and "categories" in self.dataset:
117
- for ann in self.dataset["annotations"]:
118
- catToImgs[ann["category_id"]].append(ann["image_id"])
119
-
120
- print("index created!")
121
-
122
- # create class members
123
- self.anns = anns
124
- self.imgToAnns = imgToAnns
125
- self.catToImgs = catToImgs
126
- self.imgs = imgs
127
- self.cats = cats
128
-
129
- def info(self):
130
- """
131
- Print information about the annotation file.
132
- :return:
133
- """
134
- for key, value in self.dataset["info"].items():
135
- print("{}: {}".format(key, value))
136
-
137
- def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
138
- """
139
- Get ann ids that satisfy given filter conditions. default skips that filter
140
- :param imgIds (int array) : get anns for given imgs
141
- catIds (int array) : get anns for given cats
142
- areaRng (float array) : get anns for given area range (e.g. [0 inf])
143
- iscrowd (boolean) : get anns for given crowd label (False or True)
144
- :return: ids (int array) : integer array of ann ids
145
- """
146
- imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
147
- catIds = catIds if _isArrayLike(catIds) else [catIds]
148
-
149
- if len(imgIds) == len(catIds) == len(areaRng) == 0:
150
- anns = self.dataset["annotations"]
151
- else:
152
- if not len(imgIds) == 0:
153
- lists = [
154
- self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns
155
- ]
156
- anns = list(itertools.chain.from_iterable(lists))
157
- else:
158
- anns = self.dataset["annotations"]
159
- anns = (
160
- anns
161
- if len(catIds) == 0
162
- else [ann for ann in anns if ann["category_id"] in catIds]
163
- )
164
- anns = (
165
- anns
166
- if len(areaRng) == 0
167
- else [
168
- ann
169
- for ann in anns
170
- if ann["area"] > areaRng[0] and ann["area"] < areaRng[1]
171
- ]
172
- )
173
- if not iscrowd == None:
174
- ids = [ann["id"] for ann in anns if ann["iscrowd"] == iscrowd]
175
- else:
176
- ids = [ann["id"] for ann in anns]
177
- return ids
178
-
179
- def getCatIds(self, catNms=[], supNms=[], catIds=[]):
180
- """
181
- filtering parameters. default skips that filter.
182
- :param catNms (str array) : get cats for given cat names
183
- :param supNms (str array) : get cats for given supercategory names
184
- :param catIds (int array) : get cats for given cat ids
185
- :return: ids (int array) : integer array of cat ids
186
- """
187
- catNms = catNms if _isArrayLike(catNms) else [catNms]
188
- supNms = supNms if _isArrayLike(supNms) else [supNms]
189
- catIds = catIds if _isArrayLike(catIds) else [catIds]
190
-
191
- if len(catNms) == len(supNms) == len(catIds) == 0:
192
- cats = self.dataset["categories"]
193
- else:
194
- cats = self.dataset["categories"]
195
- cats = (
196
- cats
197
- if len(catNms) == 0
198
- else [cat for cat in cats if cat["name"] in catNms]
199
- )
200
- cats = (
201
- cats
202
- if len(supNms) == 0
203
- else [cat for cat in cats if cat["supercategory"] in supNms]
204
- )
205
- cats = (
206
- cats
207
- if len(catIds) == 0
208
- else [cat for cat in cats if cat["id"] in catIds]
209
- )
210
- ids = [cat["id"] for cat in cats]
211
- return ids
212
-
213
- def getImgIds(self, imgIds=[], catIds=[]):
214
- """
215
- Get img ids that satisfy given filter conditions.
216
- :param imgIds (int array) : get imgs for given ids
217
- :param catIds (int array) : get imgs with all given cats
218
- :return: ids (int array) : integer array of img ids
219
- """
220
- imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
221
- catIds = catIds if _isArrayLike(catIds) else [catIds]
222
-
223
- if len(imgIds) == len(catIds) == 0:
224
- ids = self.imgs.keys()
225
- else:
226
- ids = set(imgIds)
227
- for i, catId in enumerate(catIds):
228
- if i == 0 and len(ids) == 0:
229
- ids = set(self.catToImgs[catId])
230
- else:
231
- ids &= set(self.catToImgs[catId])
232
- return list(ids)
233
-
234
- def loadAnns(self, ids=[]):
235
- """
236
- Load anns with the specified ids.
237
- :param ids (int array) : integer ids specifying anns
238
- :return: anns (object array) : loaded ann objects
239
- """
240
- if _isArrayLike(ids):
241
- return [self.anns[id] for id in ids]
242
- elif type(ids) == int:
243
- return [self.anns[ids]]
244
-
245
- def loadCats(self, ids=[]):
246
- """
247
- Load cats with the specified ids.
248
- :param ids (int array) : integer ids specifying cats
249
- :return: cats (object array) : loaded cat objects
250
- """
251
- if _isArrayLike(ids):
252
- return [self.cats[id] for id in ids]
253
- elif type(ids) == int:
254
- return [self.cats[ids]]
255
-
256
- def loadImgs(self, ids=[]):
257
- """
258
- Load anns with the specified ids.
259
- :param ids (int array) : integer ids specifying img
260
- :return: imgs (object array) : loaded img objects
261
- """
262
- if _isArrayLike(ids):
263
- return [self.imgs[id] for id in ids]
264
- elif type(ids) == int:
265
- return [self.imgs[ids]]
266
-
267
- def showAnns(self, anns, draw_bbox=False):
268
- """
269
- Display the specified annotations.
270
- :param anns (array of object): annotations to display
271
- :return: None
272
- """
273
- if len(anns) == 0:
274
- return 0
275
- if "segmentation" in anns[0] or "keypoints" in anns[0]:
276
- datasetType = "instances"
277
- elif "caption" in anns[0]:
278
- datasetType = "captions"
279
- else:
280
- raise Exception("datasetType not supported")
281
- if datasetType == "instances":
282
- ax = plt.gca()
283
- ax.set_autoscale_on(False)
284
- polygons = []
285
- color = []
286
- for ann in anns:
287
- c = (np.random.random((1, 3)) * 0.6 + 0.4).tolist()[0]
288
- if "segmentation" in ann:
289
- if type(ann["segmentation"]) == list:
290
- # polygon
291
- for seg in ann["segmentation"]:
292
- poly = np.array(seg).reshape((int(len(seg) / 2), 2))
293
- polygons.append(Polygon(poly))
294
- color.append(c)
295
- else:
296
- raise NotImplementedError(
297
- "This type is not is not supported yet."
298
- )
299
- # # mask
300
- # t = self.imgs[ann['image_id']]
301
- # if type(ann['segmentation']['counts']) == list:
302
- # rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
303
- # else:
304
- # rle = [ann['segmentation']]
305
- # m = maskUtils.decode(rle)
306
- # img = np.ones( (m.shape[0], m.shape[1], 3) )
307
- # if ann['iscrowd'] == 1:
308
- # color_mask = np.array([2.0,166.0,101.0])/255
309
- # if ann['iscrowd'] == 0:
310
- # color_mask = np.random.random((1, 3)).tolist()[0]
311
- # for i in range(3):
312
- # img[:,:,i] = color_mask[i]
313
- # ax.imshow(np.dstack( (img, m*0.5) ))
314
- if "keypoints" in ann and type(ann["keypoints"]) == list:
315
- # turn skeleton into zero-based index
316
- sks = np.array(self.loadCats(ann["category_id"])[0]["skeleton"]) - 1
317
- kp = np.array(ann["keypoints"])
318
- x = kp[0::3]
319
- y = kp[1::3]
320
- v = kp[2::3]
321
- for sk in sks:
322
- if np.all(v[sk] > 0):
323
- plt.plot(x[sk], y[sk], linewidth=3, color=c)
324
- plt.plot(
325
- x[v > 0],
326
- y[v > 0],
327
- "o",
328
- markersize=8,
329
- markerfacecolor=c,
330
- markeredgecolor="k",
331
- markeredgewidth=2,
332
- )
333
- plt.plot(
334
- x[v > 1],
335
- y[v > 1],
336
- "o",
337
- markersize=8,
338
- markerfacecolor=c,
339
- markeredgecolor=c,
340
- markeredgewidth=2,
341
- )
342
-
343
- if draw_bbox:
344
- [bbox_x, bbox_y, bbox_w, bbox_h] = ann["bbox"]
345
- poly = [
346
- [bbox_x, bbox_y],
347
- [bbox_x, bbox_y + bbox_h],
348
- [bbox_x + bbox_w, bbox_y + bbox_h],
349
- [bbox_x + bbox_w, bbox_y],
350
- ]
351
- np_poly = np.array(poly).reshape((4, 2))
352
- polygons.append(Polygon(np_poly))
353
- color.append(c)
354
-
355
- p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
356
- ax.add_collection(p)
357
- p = PatchCollection(
358
- polygons, facecolor="none", edgecolors=color, linewidths=2
359
- )
360
- ax.add_collection(p)
361
- elif datasetType == "captions":
362
- for ann in anns:
363
- print(ann["caption"])
364
-
365
- def loadRes(self, resFile):
366
- """
367
- Load result file and return a result api object.
368
- :param resFile (str) : file name of result file
369
- :return: res (obj) : result api object
370
- """
371
- res = COCO()
372
- res.dataset["images"] = [img for img in self.dataset["images"]]
373
-
374
- print("Loading and preparing results...")
375
- tic = time.time()
376
- if type(resFile) == str or (PYTHON_VERSION == 2 and type(resFile) == unicode):
377
- anns = json.load(open(resFile))
378
- elif type(resFile) == np.ndarray:
379
- anns = self.loadNumpyAnnotations(resFile)
380
- else:
381
- anns = resFile
382
- assert type(anns) == list, "results in not an array of objects"
383
- annsImgIds = [ann["image_id"] for ann in anns]
384
- assert set(annsImgIds) == (
385
- set(annsImgIds) & set(self.getImgIds())
386
- ), "Results do not correspond to current coco set"
387
- if "caption" in anns[0]:
388
- raise NotImplementedError("Evaluating caption is not supported yet.")
389
- elif "bbox" in anns[0] and not anns[0]["bbox"] == []:
390
- res.dataset["categories"] = copy.deepcopy(self.dataset["categories"])
391
- for id, ann in enumerate(anns):
392
- bb = ann["bbox"]
393
- x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
394
- if not "segmentation" in ann:
395
- ann["segmentation"] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
396
- ann["area"] = bb[2] * bb[3]
397
- ann["id"] = id + 1
398
- ann["iscrowd"] = 0
399
- elif "segmentation" in anns[0]:
400
- raise NotImplementedError("Evaluating caption is not supported yet.")
401
- elif "keypoints" in anns[0]:
402
- raise NotImplementedError("Evaluating caption is not supported yet.")
403
- print("DONE (t={:0.2f}s)".format(time.time() - tic))
404
-
405
- res.dataset["annotations"] = anns
406
- res.createIndex()
407
- return res
408
-
409
- def download(self, tarDir=None, imgIds=[]):
410
- """
411
- Download COCO images from mscoco.org server.
412
- :param tarDir (str): COCO results directory name
413
- imgIds (list): images to be downloaded
414
- :return:
415
- """
416
- if tarDir is None:
417
- print("Please specify target directory")
418
- return -1
419
- if len(imgIds) == 0:
420
- imgs = self.imgs.values()
421
- else:
422
- imgs = self.loadImgs(imgIds)
423
- N = len(imgs)
424
- if not os.path.exists(tarDir):
425
- os.makedirs(tarDir)
426
- for i, img in enumerate(imgs):
427
- tic = time.time()
428
- fname = os.path.join(tarDir, img["file_name"])
429
- if not os.path.exists(fname):
430
- urlretrieve(img["coco_url"], fname)
431
- print(
432
- "downloaded {}/{} images (t={:0.1f}s)".format(i, N, time.time() - tic)
433
- )
434
-
435
- def loadNumpyAnnotations(self, data):
436
- """
437
- Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
438
- :param data (numpy.ndarray)
439
- :return: annotations (python nested list)
440
- """
441
- print("Converting ndarray to lists...")
442
- assert type(data) == np.ndarray
443
- print(data.shape)
444
- assert data.shape[1] == 7
445
- N = data.shape[0]
446
- ann = []
447
- for i in range(N):
448
- if i % 1000000 == 0:
449
- print("{}/{}".format(i, N))
450
- ann += [
451
- {
452
- "image_id": int(data[i, 0]),
453
- "bbox": [data[i, 1], data[i, 2], data[i, 3], data[i, 4]],
454
- "score": data[i, 5],
455
- "category_id": int(data[i, 6]),
456
- }
457
- ]
458
- return ann
459
-
460
- def annToRLE(self, ann):
461
- """
462
- Convert annotation which can be polygons, uncompressed RLE to RLE.
463
- :return: binary mask (numpy 2D array)
464
- """
465
- t = self.imgs[ann["image_id"]]
466
- h, w = t["height"], t["width"]
467
- segm = ann["segmentation"]
468
- if type(segm) == list:
469
- raise NotImplementedError("This type is not is not supported yet.")
470
- # polygon -- a single object might consist of multiple parts
471
- # we merge all parts into one mask rle code
472
- # rles = maskUtils.frPyObjects(segm, h, w)
473
- # rle = maskUtils.merge(rles)
474
- elif type(segm["counts"]) == list:
475
- raise NotImplementedError("This type is not is not supported yet.")
476
- # uncompressed RLE
477
- # rle = maskUtils.frPyObjects(segm, h, w)
478
- else:
479
- # rle
480
- rle = ann["segmentation"]
481
- return rle
482
-
483
- def annToMask(self, ann):
484
- """
485
- Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
486
- :return: binary mask (numpy 2D array)
487
- """
488
- rle = self.annToRLE(ann)
489
- # m = maskUtils.decode(rle)
490
- raise NotImplementedError("This type is not is not supported yet.")
491
- return m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection_metrics/pycocotools/cocoeval.py DELETED
@@ -1,631 +0,0 @@
1
- # This code is basically a copy and paste from the original cocoapi repo:
2
- # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py
3
- # with the following changes have been made:
4
- # * Replace the usage of mask (maskUtils) by MaskEvaluator.
5
- # * Comment out prints in the evaluate() function.
6
- # * Include a return of the function evaluate. Inspired
7
- # by @ybelkada (https://huggingface.co/spaces/ybelkada/cocoevaluate/)
8
-
9
- __author__ = "tsungyi"
10
-
11
- import copy
12
- import datetime
13
- import time
14
- from collections import defaultdict
15
- from packaging import version
16
-
17
- import numpy as np
18
-
19
- if version.parse(np.__version__) < version.parse("1.24"):
20
- dtype_float = np.float
21
- else:
22
- dtype_float = np.float32
23
-
24
- from .mask_utils import MaskEvaluator as maskUtils
25
-
26
- class COCOeval:
27
- # Interface for evaluating detection on the Microsoft COCO dataset.
28
- #
29
- # The usage for CocoEval is as follows:
30
- # cocoGt=..., cocoDt=... # load dataset and results
31
- # E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
32
- # E.params.recThrs = ...; # set parameters as desired
33
- # E.evaluate(); # run per image evaluation
34
- # E.accumulate(); # accumulate per image results
35
- # E.summarize(); # display summary metrics of results
36
- # For example usage see evalDemo.m and http://mscoco.org/.
37
- #
38
- # The evaluation parameters are as follows (defaults in brackets):
39
- # imgIds - [all] N img ids to use for evaluation
40
- # catIds - [all] K cat ids to use for evaluation
41
- # iouThrs - [.5:.05:.95] T=10 IoU thresholds for evaluation
42
- # recThrs - [0:.01:1] R=101 recall thresholds for evaluation
43
- # areaRng - [...] A=4 object area ranges for evaluation
44
- # maxDets - [1 10 100] M=3 thresholds on max detections per image
45
- # iouType - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
46
- # iouType replaced the now DEPRECATED useSegm parameter.
47
- # useCats - [1] if true use category labels for evaluation
48
- # Note: if useCats=0 category labels are ignored as in proposal scoring.
49
- # Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
50
- #
51
- # evaluate(): evaluates detections on every image and every category and
52
- # concats the results into the "evalImgs" with fields:
53
- # dtIds - [1xD] id for each of the D detections (dt)
54
- # gtIds - [1xG] id for each of the G ground truths (gt)
55
- # dtMatches - [TxD] matching gt id at each IoU or 0
56
- # gtMatches - [TxG] matching dt id at each IoU or 0
57
- # dtScores - [1xD] confidence of each dt
58
- # gtIgnore - [1xG] ignore flag for each gt
59
- # dtIgnore - [TxD] ignore flag for each dt at each IoU
60
- #
61
- # accumulate(): accumulates the per-image, per-category evaluation
62
- # results in "evalImgs" into the dictionary "eval" with fields:
63
- # params - parameters used for evaluation
64
- # date - date evaluation was performed
65
- # counts - [T,R,K,A,M] parameter dimensions (see above)
66
- # precision - [TxRxKxAxM] precision for every evaluation setting
67
- # recall - [TxKxAxM] max recall for every evaluation setting
68
- # Note: precision and recall==-1 for settings with no gt objects.
69
- #
70
- # See also coco, mask, pycocoDemo, pycocoEvalDemo
71
- #
72
- # Microsoft COCO Toolbox. version 2.0
73
- # Data, paper, and tutorials available at: http://mscoco.org/
74
- # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
75
- # Licensed under the Simplified BSD License [see coco/license.txt]
76
- def __init__(self, cocoGt=None, cocoDt=None, iouType="segm"):
77
- """
78
- Initialize CocoEval using coco APIs for gt and dt
79
- :param cocoGt: coco object with ground truth annotations
80
- :param cocoDt: coco object with detection results
81
- :return: None
82
- """
83
- if not iouType:
84
- print("iouType not specified. use default iouType segm")
85
- self.cocoGt = cocoGt # ground truth COCO API
86
- self.cocoDt = cocoDt # detections COCO API
87
- self.evalImgs = defaultdict(
88
- list
89
- ) # per-image per-category evaluation results [KxAxI] elements
90
- self.eval = {} # accumulated evaluation results
91
- self._gts = defaultdict(list) # gt for evaluation
92
- self._dts = defaultdict(list) # dt for evaluation
93
- self.params = Params(iouType=iouType) # parameters
94
- self._paramsEval = {} # parameters for evaluation
95
- self.stats = [] # result summarization
96
- self.ious = {} # ious between all gts and dts
97
- if not cocoGt is None:
98
- self.params.imgIds = sorted(cocoGt.getImgIds())
99
- self.params.catIds = sorted(cocoGt.getCatIds())
100
-
101
- def _prepare(self):
102
- """
103
- Prepare ._gts and ._dts for evaluation based on params
104
- :return: None
105
- """
106
-
107
- def _toMask(anns, coco):
108
- # modify ann['segmentation'] by reference
109
- for ann in anns:
110
- rle = coco.annToRLE(ann)
111
- ann["segmentation"] = rle
112
-
113
- p = self.params
114
- if p.useCats:
115
- gts = self.cocoGt.loadAnns(
116
- self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds)
117
- )
118
- dts = self.cocoDt.loadAnns(
119
- self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds)
120
- )
121
- else:
122
- gts = self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
123
- dts = self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
124
-
125
- # convert ground truth to mask if iouType == 'segm'
126
- if p.iouType == "segm":
127
- _toMask(gts, self.cocoGt)
128
- _toMask(dts, self.cocoDt)
129
- # set ignore flag
130
- for gt in gts:
131
- gt["ignore"] = gt["ignore"] if "ignore" in gt else 0
132
- gt["ignore"] = "iscrowd" in gt and gt["iscrowd"]
133
- if p.iouType == "keypoints":
134
- gt["ignore"] = (gt["num_keypoints"] == 0) or gt["ignore"]
135
- self._gts = defaultdict(list) # gt for evaluation
136
- self._dts = defaultdict(list) # dt for evaluation
137
- for gt in gts:
138
- self._gts[gt["image_id"], gt["category_id"]].append(gt)
139
- for dt in dts:
140
- self._dts[dt["image_id"], dt["category_id"]].append(dt)
141
- self.evalImgs = defaultdict(list) # per-image per-category evaluation results
142
- self.eval = {} # accumulated evaluation results
143
-
144
- def evaluate(self):
145
- """
146
- Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
147
- :return: None
148
- """
149
- # tic = time.time()
150
- # print("Running per image evaluation...")
151
- p = self.params
152
- # add backward compatibility if useSegm is specified in params
153
- if not p.useSegm is None:
154
- p.iouType = "segm" if p.useSegm == 1 else "bbox"
155
- # print(
156
- # "useSegm (deprecated) is not None. Running {} evaluation".format(
157
- # p.iouType
158
- # )
159
- # )
160
- # print("Evaluate annotation type *{}*".format(p.iouType))
161
- p.imgIds = list(np.unique(p.imgIds))
162
- if p.useCats:
163
- p.catIds = list(np.unique(p.catIds))
164
- p.maxDets = sorted(p.maxDets)
165
- self.params = p
166
-
167
- self._prepare()
168
- # loop through images, area range, max detection number
169
- catIds = p.catIds if p.useCats else [-1]
170
-
171
- if p.iouType == "segm" or p.iouType == "bbox":
172
- computeIoU = self.computeIoU
173
- elif p.iouType == "keypoints":
174
- computeIoU = self.computeOks
175
- self.ious = {
176
- (imgId, catId): computeIoU(imgId, catId)
177
- for imgId in p.imgIds
178
- for catId in catIds
179
- }
180
-
181
- evaluateImg = self.evaluateImg
182
- maxDet = p.maxDets[-1]
183
- self.evalImgs = [
184
- evaluateImg(imgId, catId, areaRng, maxDet)
185
- for catId in catIds
186
- for areaRng in p.areaRng
187
- for imgId in p.imgIds
188
- ]
189
- self._paramsEval = copy.deepcopy(self.params)
190
- ret_evalImgs = np.asarray(self.evalImgs).reshape(
191
- len(catIds), len(p.areaRng), len(p.imgIds)
192
- )
193
- # toc = time.time()
194
- # print("DONE (t={:0.2f}s).".format(toc - tic))
195
- return ret_evalImgs
196
-
197
- def computeIoU(self, imgId, catId):
198
- p = self.params
199
- if p.useCats:
200
- gt = self._gts[imgId, catId]
201
- dt = self._dts[imgId, catId]
202
- else:
203
- gt = [_ for cId in p.catIds for _ in self._gts[imgId, cId]]
204
- dt = [_ for cId in p.catIds for _ in self._dts[imgId, cId]]
205
- if len(gt) == 0 and len(dt) == 0:
206
- return []
207
- inds = np.argsort([-d["score"] for d in dt], kind="mergesort")
208
- dt = [dt[i] for i in inds]
209
- if len(dt) > p.maxDets[-1]:
210
- dt = dt[0 : p.maxDets[-1]]
211
-
212
- if p.iouType == "segm":
213
- g = [g["segmentation"] for g in gt]
214
- d = [d["segmentation"] for d in dt]
215
- elif p.iouType == "bbox":
216
- g = [g["bbox"] for g in gt]
217
- d = [d["bbox"] for d in dt]
218
- else:
219
- raise Exception("unknown iouType for iou computation")
220
-
221
- # compute iou between each dt and gt region
222
- iscrowd = [int(o["iscrowd"]) for o in gt]
223
- ious = maskUtils.iou(d, g, iscrowd)
224
- return ious
225
-
226
- def computeOks(self, imgId, catId):
227
- p = self.params
228
- # dimention here should be Nxm
229
- gts = self._gts[imgId, catId]
230
- dts = self._dts[imgId, catId]
231
- inds = np.argsort([-d["score"] for d in dts], kind="mergesort")
232
- dts = [dts[i] for i in inds]
233
- if len(dts) > p.maxDets[-1]:
234
- dts = dts[0 : p.maxDets[-1]]
235
- # if len(gts) == 0 and len(dts) == 0:
236
- if len(gts) == 0 or len(dts) == 0:
237
- return []
238
- ious = np.zeros((len(dts), len(gts)))
239
- sigmas = p.kpt_oks_sigmas
240
- vars = (sigmas * 2) ** 2
241
- k = len(sigmas)
242
- # compute oks between each detection and ground truth object
243
- for j, gt in enumerate(gts):
244
- # create bounds for ignore regions(double the gt bbox)
245
- g = np.array(gt["keypoints"])
246
- xg = g[0::3]
247
- yg = g[1::3]
248
- vg = g[2::3]
249
- k1 = np.count_nonzero(vg > 0)
250
- bb = gt["bbox"]
251
- x0 = bb[0] - bb[2]
252
- x1 = bb[0] + bb[2] * 2
253
- y0 = bb[1] - bb[3]
254
- y1 = bb[1] + bb[3] * 2
255
- for i, dt in enumerate(dts):
256
- d = np.array(dt["keypoints"])
257
- xd = d[0::3]
258
- yd = d[1::3]
259
- if k1 > 0:
260
- # measure the per-keypoint distance if keypoints visible
261
- dx = xd - xg
262
- dy = yd - yg
263
- else:
264
- # measure minimum distance to keypoints in (x0,y0) & (x1,y1)
265
- z = np.zeros((k))
266
- dx = np.max((z, x0 - xd), axis=0) + np.max((z, xd - x1), axis=0)
267
- dy = np.max((z, y0 - yd), axis=0) + np.max((z, yd - y1), axis=0)
268
- e = (dx**2 + dy**2) / vars / (gt["area"] + np.spacing(1)) / 2
269
- if k1 > 0:
270
- e = e[vg > 0]
271
- ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
272
- return ious
273
-
274
- def evaluateImg(self, imgId, catId, aRng, maxDet):
275
- """
276
- perform evaluation for single category and image
277
- :return: dict (single image results)
278
- """
279
- p = self.params
280
- if p.useCats:
281
- gt = self._gts[imgId, catId]
282
- dt = self._dts[imgId, catId]
283
- else:
284
- gt = [_ for cId in p.catIds for _ in self._gts[imgId, cId]]
285
- dt = [_ for cId in p.catIds for _ in self._dts[imgId, cId]]
286
- if len(gt) == 0 and len(dt) == 0:
287
- return None
288
-
289
- for g in gt:
290
- if g["ignore"] or (g["area"] < aRng[0] or g["area"] > aRng[1]):
291
- g["_ignore"] = 1
292
- else:
293
- g["_ignore"] = 0
294
-
295
- # sort dt highest score first, sort gt ignore last
296
- gtind = np.argsort([g["_ignore"] for g in gt], kind="mergesort")
297
- gt = [gt[i] for i in gtind]
298
- dtind = np.argsort([-d["score"] for d in dt], kind="mergesort")
299
- dt = [dt[i] for i in dtind[0:maxDet]]
300
- iscrowd = [int(o["iscrowd"]) for o in gt]
301
- # load computed ious
302
- ious = (
303
- self.ious[imgId, catId][:, gtind]
304
- if len(self.ious[imgId, catId]) > 0
305
- else self.ious[imgId, catId]
306
- )
307
-
308
- T = len(p.iouThrs)
309
- G = len(gt)
310
- D = len(dt)
311
- gtm = np.zeros((T, G))
312
- dtm = np.zeros((T, D))
313
- gtIg = np.array([g["_ignore"] for g in gt])
314
- dtIg = np.zeros((T, D))
315
- if not len(ious) == 0:
316
- for tind, t in enumerate(p.iouThrs):
317
- for dind, d in enumerate(dt):
318
- # information about best match so far (m=-1 -> unmatched)
319
- iou = min([t, 1 - 1e-10])
320
- m = -1
321
- for gind, g in enumerate(gt):
322
- # if this gt already matched, and not a crowd, continue
323
- if gtm[tind, gind] > 0 and not iscrowd[gind]:
324
- continue
325
- # if dt matched to reg gt, and on ignore gt, stop
326
- if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
327
- break
328
- # continue to next gt unless better match made
329
- if ious[dind, gind] < iou:
330
- continue
331
- # if match successful and best so far, store appropriately
332
- iou = ious[dind, gind]
333
- m = gind
334
- # if match made store id of match for both dt and gt
335
- if m == -1:
336
- continue
337
- dtIg[tind, dind] = gtIg[m]
338
- dtm[tind, dind] = gt[m]["id"]
339
- gtm[tind, m] = d["id"]
340
- # set unmatched detections outside of area range to ignore
341
- a = np.array([d["area"] < aRng[0] or d["area"] > aRng[1] for d in dt]).reshape(
342
- (1, len(dt))
343
- )
344
- dtIg = np.logical_or(dtIg, np.logical_and(dtm == 0, np.repeat(a, T, 0)))
345
- # store results for given image and category
346
- return {
347
- "image_id": imgId,
348
- "category_id": catId,
349
- "aRng": aRng,
350
- "maxDet": maxDet,
351
- "dtIds": [d["id"] for d in dt],
352
- "gtIds": [g["id"] for g in gt],
353
- "dtMatches": dtm,
354
- "gtMatches": gtm,
355
- "dtScores": [d["score"] for d in dt],
356
- "gtIgnore": gtIg,
357
- "dtIgnore": dtIg,
358
- }
359
-
360
- def accumulate(self, p=None):
361
- """
362
- Accumulate per image evaluation results and store the result in self.eval
363
- :param p: input params for evaluation
364
- :return: None
365
- """
366
- print("Accumulating evaluation results...")
367
- tic = time.time()
368
- if not self.evalImgs:
369
- print("Please run evaluate() first")
370
- # allows input customized parameters
371
- if p is None:
372
- p = self.params
373
- p.catIds = p.catIds if p.useCats == 1 else [-1]
374
- T = len(p.iouThrs)
375
- R = len(p.recThrs)
376
- K = len(p.catIds) if p.useCats else 1
377
- A = len(p.areaRng)
378
- M = len(p.maxDets)
379
- precision = -np.ones(
380
- (T, R, K, A, M)
381
- ) # -1 for the precision of absent categories
382
- recall = -np.ones((T, K, A, M))
383
- scores = -np.ones((T, R, K, A, M))
384
-
385
- # create dictionary for future indexing
386
- _pe = self._paramsEval
387
- catIds = _pe.catIds if _pe.useCats else [-1]
388
- setK = set(catIds)
389
- setA = set(map(tuple, _pe.areaRng))
390
- setM = set(_pe.maxDets)
391
- setI = set(_pe.imgIds)
392
- # get inds to evaluate
393
- k_list = [n for n, k in enumerate(p.catIds) if k in setK]
394
- m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
395
- a_list = [
396
- n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA
397
- ]
398
- i_list = [n for n, i in enumerate(p.imgIds) if i in setI]
399
- I0 = len(_pe.imgIds)
400
- A0 = len(_pe.areaRng)
401
- # retrieve E at each category, area range, and max number of detections
402
- for k, k0 in enumerate(k_list):
403
- Nk = k0 * A0 * I0
404
- for a, a0 in enumerate(a_list):
405
- Na = a0 * I0
406
- for m, maxDet in enumerate(m_list):
407
- E = [self.evalImgs[Nk + Na + i] for i in i_list]
408
- E = [e for e in E if not e is None]
409
- if len(E) == 0:
410
- continue
411
- dtScores = np.concatenate([e["dtScores"][0:maxDet] for e in E])
412
-
413
- # different sorting method generates slightly different results.
414
- # mergesort is used to be consistent as Matlab implementation.
415
- inds = np.argsort(-dtScores, kind="mergesort")
416
- dtScoresSorted = dtScores[inds]
417
-
418
- dtm = np.concatenate(
419
- [e["dtMatches"][:, 0:maxDet] for e in E], axis=1
420
- )[:, inds]
421
- dtIg = np.concatenate(
422
- [e["dtIgnore"][:, 0:maxDet] for e in E], axis=1
423
- )[:, inds]
424
- gtIg = np.concatenate([e["gtIgnore"] for e in E])
425
- npig = np.count_nonzero(gtIg == 0)
426
- if npig == 0:
427
- continue
428
- tps = np.logical_and(dtm, np.logical_not(dtIg))
429
- fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
430
-
431
- tp_sum = np.cumsum(tps, axis=1).astype(dtype=dtype_float)
432
- fp_sum = np.cumsum(fps, axis=1).astype(dtype=dtype_float)
433
- for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
434
- tp = np.array(tp)
435
- fp = np.array(fp)
436
- nd = len(tp)
437
- rc = tp / npig
438
- pr = tp / (fp + tp + np.spacing(1))
439
- q = np.zeros((R,))
440
- ss = np.zeros((R,))
441
-
442
- if nd:
443
- recall[t, k, a, m] = rc[-1]
444
- else:
445
- recall[t, k, a, m] = 0
446
-
447
- # numpy is slow without cython optimization for accessing elements
448
- # use python array gets significant speed improvement
449
- pr = pr.tolist()
450
- q = q.tolist()
451
-
452
- for i in range(nd - 1, 0, -1):
453
- if pr[i] > pr[i - 1]:
454
- pr[i - 1] = pr[i]
455
-
456
- inds = np.searchsorted(rc, p.recThrs, side="left")
457
- try:
458
- for ri, pi in enumerate(inds):
459
- q[ri] = pr[pi]
460
- ss[ri] = dtScoresSorted[pi]
461
- except:
462
- pass
463
- precision[t, :, k, a, m] = np.array(q)
464
- scores[t, :, k, a, m] = np.array(ss)
465
- self.eval = {
466
- "params": p,
467
- "counts": [T, R, K, A, M],
468
- "date": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
469
- "precision": precision,
470
- "recall": recall,
471
- "scores": scores,
472
- }
473
- toc = time.time()
474
- print("DONE (t={:0.2f}s).".format(toc - tic))
475
-
476
- def summarize(self):
477
- """
478
- Compute and display summary metrics for evaluation results.
479
- Note this functin can *only* be applied on the default parameter setting
480
- """
481
-
482
- def _summarize(ap=1, iouThr=None, areaRng="all", maxDets=100):
483
- p = self.params
484
- iStr = " {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}"
485
- titleStr = "Average Precision" if ap == 1 else "Average Recall"
486
- typeStr = "(AP)" if ap == 1 else "(AR)"
487
- iouStr = (
488
- "{:0.2f}:{:0.2f}".format(p.iouThrs[0], p.iouThrs[-1])
489
- if iouThr is None
490
- else "{:0.2f}".format(iouThr)
491
- )
492
-
493
- aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
494
- mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
495
- if ap == 1:
496
- # dimension of precision: [TxRxKxAxM]
497
- s = self.eval["precision"]
498
- # IoU
499
- if iouThr is not None:
500
- t = np.where(iouThr == p.iouThrs)[0]
501
- s = s[t]
502
- s = s[:, :, :, aind, mind]
503
- else:
504
- # dimension of recall: [TxKxAxM]
505
- s = self.eval["recall"]
506
- if iouThr is not None:
507
- t = np.where(iouThr == p.iouThrs)[0]
508
- s = s[t]
509
- s = s[:, :, aind, mind]
510
- if len(s[s > -1]) == 0:
511
- mean_s = -1
512
- else:
513
- mean_s = np.mean(s[s > -1])
514
- print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
515
- return mean_s
516
-
517
- def _summarizeDets():
518
- stats = np.zeros((12,))
519
- stats[0] = _summarize(1)
520
- stats[1] = _summarize(1, iouThr=0.5, maxDets=self.params.maxDets[2])
521
- stats[2] = _summarize(1, iouThr=0.75, maxDets=self.params.maxDets[2])
522
- stats[3] = _summarize(1, areaRng="small", maxDets=self.params.maxDets[2])
523
- stats[4] = _summarize(1, areaRng="medium", maxDets=self.params.maxDets[2])
524
- stats[5] = _summarize(1, areaRng="large", maxDets=self.params.maxDets[2])
525
- stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
526
- stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
527
- stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
528
- stats[9] = _summarize(0, areaRng="small", maxDets=self.params.maxDets[2])
529
- stats[10] = _summarize(0, areaRng="medium", maxDets=self.params.maxDets[2])
530
- stats[11] = _summarize(0, areaRng="large", maxDets=self.params.maxDets[2])
531
- return stats
532
-
533
- def _summarizeKps():
534
- stats = np.zeros((10,))
535
- stats[0] = _summarize(1, maxDets=20)
536
- stats[1] = _summarize(1, maxDets=20, iouThr=0.5)
537
- stats[2] = _summarize(1, maxDets=20, iouThr=0.75)
538
- stats[3] = _summarize(1, maxDets=20, areaRng="medium")
539
- stats[4] = _summarize(1, maxDets=20, areaRng="large")
540
- stats[5] = _summarize(0, maxDets=20)
541
- stats[6] = _summarize(0, maxDets=20, iouThr=0.5)
542
- stats[7] = _summarize(0, maxDets=20, iouThr=0.75)
543
- stats[8] = _summarize(0, maxDets=20, areaRng="medium")
544
- stats[9] = _summarize(0, maxDets=20, areaRng="large")
545
- return stats
546
-
547
- if not self.eval:
548
- raise Exception("Please run accumulate() first")
549
- iouType = self.params.iouType
550
- if iouType == "segm" or iouType == "bbox":
551
- summarize = _summarizeDets
552
- elif iouType == "keypoints":
553
- summarize = _summarizeKps
554
- self.stats = summarize()
555
-
556
- def __str__(self):
557
- self.summarize()
558
-
559
-
560
- class Params:
561
- """
562
- Params for coco evaluation api
563
- """
564
-
565
- def setDetParams(self):
566
- self.imgIds = []
567
- self.catIds = []
568
- # np.arange causes trouble. the data point on arange is slightly larger than the true value
569
- self.iouThrs = np.linspace(
570
- 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
571
- )
572
- self.recThrs = np.linspace(
573
- 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
574
- )
575
- self.maxDets = [1, 10, 100]
576
- self.areaRng = [
577
- [0**2, 1e5**2],
578
- [0**2, 32**2],
579
- [32**2, 96**2],
580
- [96**2, 1e5**2],
581
- ]
582
- self.areaRngLbl = ["all", "small", "medium", "large"]
583
- self.useCats = 1
584
-
585
- def setKpParams(self):
586
- self.imgIds = []
587
- self.catIds = []
588
- # np.arange causes trouble. the data point on arange is slightly larger than the true value
589
- self.iouThrs = np.linspace(
590
- 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
591
- )
592
- self.recThrs = np.linspace(
593
- 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
594
- )
595
- self.maxDets = [20]
596
- self.areaRng = [[0**2, 1e5**2], [32**2, 96**2], [96**2, 1e5**2]]
597
- self.areaRngLbl = ["all", "medium", "large"]
598
- self.useCats = 1
599
- self.kpt_oks_sigmas = (
600
- np.array(
601
- [
602
- 0.26,
603
- 0.25,
604
- 0.25,
605
- 0.35,
606
- 0.35,
607
- 0.79,
608
- 0.79,
609
- 0.72,
610
- 0.72,
611
- 0.62,
612
- 0.62,
613
- 1.07,
614
- 1.07,
615
- 0.87,
616
- 0.87,
617
- 0.89,
618
- 0.89,
619
- ]
620
- )
621
- / 10.0
622
- )
623
-
624
- def __init__(self, iouType="segm"):
625
- if iouType == "bbox":
626
- self.setDetParams()
627
- else:
628
- raise Exception("iouType not supported")
629
- self.iouType = iouType
630
- # useSegm is deprecated
631
- self.useSegm = None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection_metrics/pycocotools/mask.py DELETED
@@ -1,103 +0,0 @@
1
- __author__ = 'tsungyi'
2
-
3
- from detection_metrics.pycocotools import _mask
4
-
5
- # Interface for manipulating masks stored in RLE format.
6
- #
7
- # RLE is a simple yet efficient format for storing binary masks. RLE
8
- # first divides a vector (or vectorized image) into a series of piecewise
9
- # constant regions and then for each piece simply stores the length of
10
- # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
11
- # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
12
- # (note that the odd counts are always the numbers of zeros). Instead of
13
- # storing the counts directly, additional compression is achieved with a
14
- # variable bitrate representation based on a common scheme called LEB128.
15
- #
16
- # Compression is greatest given large piecewise constant regions.
17
- # Specifically, the size of the RLE is proportional to the number of
18
- # *boundaries* in M (or for an image the number of boundaries in the y
19
- # direction). Assuming fairly simple shapes, the RLE representation is
20
- # O(sqrt(n)) where n is number of pixels in the object. Hence space usage
21
- # is substantially lower, especially for large simple objects (large n).
22
- #
23
- # Many common operations on masks can be computed directly using the RLE
24
- # (without need for decoding). This includes computations such as area,
25
- # union, intersection, etc. All of these operations are linear in the
26
- # size of the RLE, in other words they are O(sqrt(n)) where n is the area
27
- # of the object. Computing these operations on the original mask is O(n).
28
- # Thus, using the RLE can result in substantial computational savings.
29
- #
30
- # The following API functions are defined:
31
- # encode - Encode binary masks using RLE.
32
- # decode - Decode binary masks encoded via RLE.
33
- # merge - Compute union or intersection of encoded masks.
34
- # iou - Compute intersection over union between masks.
35
- # area - Compute area of encoded masks.
36
- # toBbox - Get bounding boxes surrounding encoded masks.
37
- # frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
38
- #
39
- # Usage:
40
- # Rs = encode( masks )
41
- # masks = decode( Rs )
42
- # R = merge( Rs, intersect=false )
43
- # o = iou( dt, gt, iscrowd )
44
- # a = area( Rs )
45
- # bbs = toBbox( Rs )
46
- # Rs = frPyObjects( [pyObjects], h, w )
47
- #
48
- # In the API the following formats are used:
49
- # Rs - [dict] Run-length encoding of binary masks
50
- # R - dict Run-length encoding of binary mask
51
- # masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
52
- # iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
53
- # bbs - [nx4] Bounding box(es) stored as [x y w h]
54
- # poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
55
- # dt,gt - May be either bounding boxes or encoded masks
56
- # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
57
- #
58
- # Finally, a note about the intersection over union (iou) computation.
59
- # The standard iou of a ground truth (gt) and detected (dt) object is
60
- # iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
61
- # For "crowd" regions, we use a modified criteria. If a gt object is
62
- # marked as "iscrowd", we allow a dt to match any subregion of the gt.
63
- # Choosing gt' in the crowd gt that best matches the dt can be done using
64
- # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
65
- # iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
66
- # For crowd gt regions we use this modified criteria above for the iou.
67
- #
68
- # To compile run "python setup.py build_ext --inplace"
69
- # Please do not contact us for help with compiling.
70
- #
71
- # Microsoft COCO Toolbox. version 2.0
72
- # Data, paper, and tutorials available at: http://mscoco.org/
73
- # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
74
- # Licensed under the Simplified BSD License [see coco/license.txt]
75
-
76
- iou = _mask.iou
77
- merge = _mask.merge
78
- frPyObjects = _mask.frPyObjects
79
-
80
- def encode(bimask):
81
- if len(bimask.shape) == 3:
82
- return _mask.encode(bimask)
83
- elif len(bimask.shape) == 2:
84
- h, w = bimask.shape
85
- return _mask.encode(bimask.reshape((h, w, 1), order='F'))[0]
86
-
87
- def decode(rleObjs):
88
- if type(rleObjs) == list:
89
- return _mask.decode(rleObjs)
90
- else:
91
- return _mask.decode([rleObjs])[:,:,0]
92
-
93
- def area(rleObjs):
94
- if type(rleObjs) == list:
95
- return _mask.area(rleObjs)
96
- else:
97
- return _mask.area([rleObjs])[0]
98
-
99
- def toBbox(rleObjs):
100
- if type(rleObjs) == list:
101
- return _mask.toBbox(rleObjs)
102
- else:
103
- return _mask.toBbox([rleObjs])[0]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection_metrics/pycocotools/mask_utils.py DELETED
@@ -1,76 +0,0 @@
1
- # This code is a copy and paste with small modifications of the code:
2
- # https://github.com/rafaelpadilla/review_object_detection_metrics/blob/main/src/evaluators/coco_evaluator.py
3
-
4
- from typing import List
5
- import numpy as np
6
-
7
- class MaskEvaluator(object):
8
- @staticmethod
9
- def iou(
10
- dt: List[List[float]], gt: List[List[float]], iscrowd: List[bool]
11
- ) -> np.ndarray:
12
- """
13
- Calculate the intersection over union (IoU) between detection bounding boxes (dt) and \
14
- ground truth bounding boxes (gt).
15
- Reference: https://github.com/rafaelpadilla/review_object_detection_metrics
16
-
17
- Args:
18
- dt (List[List[float]]): List of detection bounding boxes in the \
19
- format [x, y, width, height].
20
- gt (List[List[float]]): List of ground-truth bounding boxes in the \
21
- format [x, y, width, height].
22
- iscrowd (List[bool]): List indicating if each ground-truth bounding box \
23
- is a crowd region or not.
24
-
25
- Returns:
26
- np.ndarray: Array of IoU values of shape (len(dt), len(gt)).
27
- """
28
- assert len(iscrowd) == len(gt), "iou(iscrowd=) must have the same length as gt"
29
- if len(dt) == 0 or len(gt) == 0:
30
- return []
31
- ious = np.zeros((len(dt), len(gt)), dtype=np.float64)
32
- for g_idx, g in enumerate(gt):
33
- for d_idx, d in enumerate(dt):
34
- ious[d_idx, g_idx] = _jaccard(d, g, iscrowd[g_idx])
35
- return ious
36
-
37
-
38
- def _jaccard(a: List[float], b: List[float], iscrowd: bool) -> float:
39
- """
40
- Calculate the Jaccard index (intersection over union) between two bounding boxes.
41
- For "crowd" regions, we use a modified criteria. If a gt object is
42
- marked as "iscrowd", we allow a dt to match any subregion of the gt.
43
- Choosing gt' in the crowd gt that best matches the dt can be done using
44
- gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
45
- iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
46
- For crowd gt regions we use this modified criteria above for the iou.
47
-
48
- Args:
49
- a (List[float]): Bounding box coordinates in the format [x, y, width, height].
50
- b (List[float]): Bounding box coordinates in the format [x, y, width, height].
51
- iscrowd (bool): Flag indicating if the second bounding box is a crowd region or not.
52
-
53
- Returns:
54
- float: Jaccard index between the two bounding boxes.
55
- """
56
- eps = 4e-12
57
- xa, ya, x2a, y2a = a[0], a[1], a[0] + a[2], a[1] + a[3]
58
- xb, yb, x2b, y2b = b[0], b[1], b[0] + b[2], b[1] + b[3]
59
-
60
- # innermost left x
61
- xi = max(xa, xb)
62
- # innermost right x
63
- x2i = min(x2a, x2b)
64
- # same for y
65
- yi = max(ya, yb)
66
- y2i = min(y2a, y2b)
67
-
68
- # calculate areas
69
- Aa = max(x2a - xa, 0.) * max(y2a - ya, 0.)
70
- Ab = max(x2b - xb, 0.) * max(y2b - yb, 0.)
71
- Ai = max(x2i - xi, 0.) * max(y2i - yi, 0.)
72
-
73
- if iscrowd:
74
- return Ai / (Aa + eps)
75
-
76
- return Ai / (Aa + Ab - Ai + eps)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection_metrics/utils.py DELETED
@@ -1,156 +0,0 @@
1
- import copy
2
- import pickle
3
- from typing import Dict, List, Tuple, Union
4
- from tqdm import tqdm
5
- import numpy as np
6
- import torch
7
- import torch.distributed as dist
8
- from datasets import Dataset
9
-
10
- from detection_metrics.pycocotools.cocoeval import COCOeval
11
-
12
- # Typings
13
- _TYPING_BOX = Tuple[float, float, float, float]
14
- _TYPING_SCORES = List[float]
15
- _TYPING_LABELS = List[int]
16
- _TYPING_BOXES = List[_TYPING_BOX]
17
- _TYPING_PRED_REF = Union[_TYPING_SCORES, _TYPING_LABELS, _TYPING_BOXES]
18
- _TYPING_PREDICTION = Dict[str, _TYPING_PRED_REF]
19
- _TYPING_REFERENCE = Dict[str, _TYPING_PRED_REF]
20
- _TYPING_PREDICTIONS = Dict[int, _TYPING_PREDICTION]
21
-
22
- def convert_to_xywh(boxes: torch.Tensor) -> torch.Tensor:
23
- """
24
- Convert bounding boxes from (xmin, ymin, xmax, ymax) format to (x, y, width, height) format.
25
-
26
- Args:
27
- boxes (torch.Tensor): Tensor of shape (N, 4) representing bounding boxes in \
28
- (xmin, ymin, xmax, ymax) format.
29
-
30
- Returns:
31
- torch.Tensor: Tensor of shape (N, 4) representing bounding boxes in (x, y, width, height) \
32
- format.
33
- """
34
- xmin, ymin, xmax, ymax = boxes.unbind(1)
35
- return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1)
36
-
37
-
38
- def create_common_coco_eval(
39
- coco_eval: COCOeval, img_ids: List[int], eval_imgs: np.ndarray
40
- ) -> None:
41
- """
42
- Create a common COCO evaluation by merging image IDs and evaluation images into the \
43
- coco_eval object.
44
-
45
- Args:
46
- coco_eval: COCOeval evaluation object.
47
- img_ids (List[int]): Tensor of image IDs.
48
- eval_imgs (torch.Tensor): Tensor of evaluation images.
49
- """
50
- img_ids, eval_imgs = merge(img_ids, eval_imgs)
51
- img_ids = list(img_ids)
52
- eval_imgs = list(eval_imgs.flatten())
53
-
54
- coco_eval.evalImgs = eval_imgs
55
- coco_eval.params.imgIds = img_ids
56
- coco_eval._paramsEval = copy.deepcopy(coco_eval.params)
57
-
58
-
59
- def merge(img_ids: List[int], eval_imgs: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
60
- """
61
- Merge image IDs and evaluation images from different processes.
62
-
63
- Args:
64
- img_ids (List[int]): List of image ID arrays from different processes.
65
- eval_imgs (np.ndarray): Evaluation images from different processes.
66
-
67
- Returns:
68
- Tuple[np.ndarray, np.ndarray]: Merged image IDs and evaluation images.
69
- """
70
- all_img_ids = all_gather(img_ids)
71
- all_eval_imgs = all_gather(eval_imgs)
72
-
73
- merged_img_ids = []
74
- for p in all_img_ids:
75
- merged_img_ids.extend(p)
76
-
77
- merged_eval_imgs = []
78
- for p in all_eval_imgs:
79
- merged_eval_imgs.append(p)
80
-
81
- merged_img_ids = np.array(merged_img_ids)
82
- merged_eval_imgs = np.concatenate(merged_eval_imgs, 2)
83
-
84
- # keep only unique (and in sorted order) images
85
- merged_img_ids, idx = np.unique(merged_img_ids, return_index=True)
86
- merged_eval_imgs = merged_eval_imgs[..., idx]
87
-
88
- return merged_img_ids, merged_eval_imgs
89
-
90
-
91
- def all_gather(data: List[int]) -> List[List[int]]:
92
- """
93
- Run all_gather on arbitrary picklable data (not necessarily tensors).
94
-
95
- Args:
96
- data (List[int]): any picklable object
97
- Returns:
98
- List[List[int]]: list of data gathered from each rank
99
- """
100
- world_size = get_world_size()
101
- if world_size == 1:
102
- return [data]
103
-
104
- # serialized to a Tensor
105
- buffer = pickle.dumps(data)
106
- storage = torch.ByteStorage.from_buffer(buffer)
107
- tensor = torch.ByteTensor(storage).to("cuda")
108
-
109
- # obtain Tensor size of each rank
110
- local_size = torch.tensor([tensor.numel()], device="cuda")
111
- size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
112
- dist.all_gather(size_list, local_size)
113
- size_list = [int(size.item()) for size in size_list]
114
- max_size = max(size_list)
115
-
116
- # receiving Tensor from all ranks
117
- # we pad the tensor because torch all_gather does not support
118
- # gathering tensors of different shapes
119
- tensor_list = []
120
- for _ in size_list:
121
- tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
122
- if local_size != max_size:
123
- padding = torch.empty(
124
- size=(max_size - local_size,), dtype=torch.uint8, device="cuda"
125
- )
126
- tensor = torch.cat((tensor, padding), dim=0)
127
- dist.all_gather(tensor_list, tensor)
128
-
129
- data_list = []
130
- for size, tensor in zip(size_list, tensor_list):
131
- buffer = tensor.cpu().numpy().tobytes()[:size]
132
- data_list.append(pickle.loads(buffer))
133
-
134
- return data_list
135
-
136
-
137
- def get_world_size() -> int:
138
- """
139
- Get the number of processes in the distributed environment.
140
-
141
- Returns:
142
- int: Number of processes.
143
- """
144
- if not is_dist_avail_and_initialized():
145
- return 1
146
- return dist.get_world_size()
147
-
148
-
149
- def is_dist_avail_and_initialized() -> bool:
150
- """
151
- Check if distributed environment is available and initialized.
152
-
153
- Returns:
154
- bool: True if distributed environment is available and initialized, False otherwise.
155
- """
156
- return dist.is_available() and dist.is_initialized()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  torch
2
  torchvision
 
 
1
  torch
2
  torchvision
3
+ matplotlib