|
# Corruption Benchmarking |
|
|
|
## Introduction |
|
|
|
We provide tools to test object detection and instance segmentation models on the image corruption benchmark defined in [Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming](https://arxiv.org/abs/1907.07484). |
|
This page provides basic tutorials how to use the benchmark. |
|
|
|
```latex |
|
@article{michaelis2019winter, |
|
title={Benchmarking Robustness in Object Detection: |
|
Autonomous Driving when Winter is Coming}, |
|
author={Michaelis, Claudio and Mitzkus, Benjamin and |
|
Geirhos, Robert and Rusak, Evgenia and |
|
Bringmann, Oliver and Ecker, Alexander S. and |
|
Bethge, Matthias and Brendel, Wieland}, |
|
journal={arXiv:1907.07484}, |
|
year={2019} |
|
} |
|
``` |
|
|
|
![image corruption example](../resources/corruptions_sev_3.png) |
|
|
|
## About the benchmark |
|
|
|
To submit results to the benchmark please visit the [benchmark homepage](https://github.com/bethgelab/robust-detection-benchmark) |
|
|
|
The benchmark is modelled after the [imagenet-c benchmark](https://github.com/hendrycks/robustness) which was originally |
|
published in [Benchmarking Neural Network Robustness to Common Corruptions and Perturbations](https://arxiv.org/abs/1903.12261) (ICLR 2019) by Dan Hendrycks and Thomas Dietterich. |
|
|
|
The image corruption functions are included in this library but can be installed separately using: |
|
|
|
```shell |
|
pip install imagecorruptions |
|
``` |
|
|
|
Compared to imagenet-c a few changes had to be made to handle images of arbitrary size and greyscale images. |
|
We also modfied the 'motion blur' and 'snow' corruptions to remove dependency from a linux specific library, |
|
which would have to be installed separately otherwise. For details please refer to the [imagecorruptions repository](https://github.com/bethgelab/imagecorruptions). |
|
|
|
## Inference with pretrained models |
|
|
|
We provide a testing script to evaluate a models performance on any combination of the corruptions provided in the benchmark. |
|
|
|
### Test a dataset |
|
|
|
- [x] single GPU testing |
|
- [ ] multiple GPU testing |
|
- [ ] visualize detection results |
|
|
|
You can use the following commands to test a models performance under the 15 corruptions used in the benchmark. |
|
|
|
```shell |
|
# single-gpu testing |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] |
|
``` |
|
|
|
Alternatively different group of corruptions can be selected. |
|
|
|
```shell |
|
# noise |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions noise |
|
|
|
# blur |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions blur |
|
|
|
# wetaher |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions weather |
|
|
|
# digital |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions digital |
|
``` |
|
|
|
Or a costom set of corruptions e.g.: |
|
|
|
```shell |
|
# gaussian noise, zoom blur and snow |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions gaussian_noise zoom_blur snow |
|
``` |
|
|
|
Finally the corruption severities to evaluate can be chosen. |
|
Severity 0 corresponds to clean data and the effect increases from 1 to 5. |
|
|
|
```shell |
|
# severity 1 |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 1 |
|
|
|
# severities 0,2,4 |
|
python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 0 2 4 |
|
``` |
|
|
|
## Results for modelzoo models |
|
|
|
The results on COCO 2017val are shown in the below table. |
|
|
|
Model | Backbone | Style | Lr schd | box AP clean | box AP corr. | box % | mask AP clean | mask AP corr. | mask % | |
|
:-----:|:---------:|:-------:|:-------:|:------------:|:------------:|:-----:|:-------------:|:-------------:|:------:| |
|
Faster R-CNN | R-50-FPN | pytorch | 1x | 36.3 | 18.2 | 50.2 | - | - | - | |
|
Faster R-CNN | R-101-FPN | pytorch | 1x | 38.5 | 20.9 | 54.2 | - | - | - | |
|
Faster R-CNN | X-101-32x4d-FPN | pytorch |1x | 40.1 | 22.3 | 55.5 | - | - | - | |
|
Faster R-CNN | X-101-64x4d-FPN | pytorch |1x | 41.3 | 23.4 | 56.6 | - | - | - | |
|
Faster R-CNN | R-50-FPN-DCN | pytorch | 1x | 40.0 | 22.4 | 56.1 | - | - | - | |
|
Faster R-CNN | X-101-32x4d-FPN-DCN | pytorch | 1x | 43.4 | 26.7 | 61.6 | - | - | - | |
|
Mask R-CNN | R-50-FPN | pytorch | 1x | 37.3 | 18.7 | 50.1 | 34.2 | 16.8 | 49.1 | |
|
Mask R-CNN | R-50-FPN-DCN | pytorch | 1x | 41.1 | 23.3 | 56.7 | 37.2 | 20.7 | 55.7 | |
|
Cascade R-CNN | R-50-FPN | pytorch | 1x | 40.4 | 20.1 | 49.7 | - | - | - | |
|
Cascade Mask R-CNN | R-50-FPN | pytorch | 1x| 41.2 | 20.7 | 50.2 | 35.7 | 17.6 | 49.3 | |
|
RetinaNet | R-50-FPN | pytorch | 1x | 35.6 | 17.8 | 50.1 | - | - | - | |
|
Hybrid Task Cascade | X-101-64x4d-FPN-DCN | pytorch | 1x | 50.6 | 32.7 | 64.7 | 43.8 | 28.1 | 64.0 | |
|
|
|
Results may vary slightly due to the stochastic application of the corruptions. |
|
|