File size: 3,090 Bytes
dfb63d6
 
cf167af
 
 
 
 
 
dfb63d6
cf167af
 
 
 
 
 
e05bb62
cf167af
aae0203
 
cf167af
 
 
 
189437c
 
 
cf167af
 
189437c
 
 
cf167af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cac7406
cf167af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98c5590
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: mit
language:
- en
pipeline_tag: zero-shot-image-classification
tags:
- ood-detection
- outlier-detection
---

<p style="font-size:28px;" align="center">
🏠 MOODv2
</p>

<p align="center">
β€’ πŸ€— <a href="https://huggingface.co/JingyaoLi/MOODv2" target="_blank">Model </a> 
β€’ 🐱 <a href="https://github.com/dvlab-research/MOOD" target="_blank">Code</a> 
β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2302.02615" target="_blank">MOODv1</a>
β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2401.02611" target="_blank">MOODv2</a> <br>
</p>

## Abstract
The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
<p align="center">
<img src="imgs/framework.png" alt="framework" width="750">
</p>

## Performance
<p align="center">
<img src="imgs/moodv2_table.png" alt="table" width="900">
</p>

## Usage
To predict an input image is in-distribution or out-of-distribution, we support the following OOD detection methods:
- `MSP`
- `MaxLogit`
- `Energy`
- `Energy+React`
- `ViM`
- `Residual`
- `GradNorm`
- `Mahalanobis`
- `KL-Matching`

```bash
python src/demo.py \
   --img_path imgs/DTD_cracked_0004.jpg \ # change to your image path if needed
   --cfg configs/beit-base-p16_224px.py \
   --checkpoint pretrain/beitv2-base_3rdparty_in1k_20221114-73e11905.pth \
   --fc data/fc.pkl \
   --id_train_feature data/imagenet_train.pkl \
   --id_val_feature data/imagenet_test.pkl \
   --methods MSP MaxLogit Energy Energy+React ViM Residual GradNorm Mahalanobis
```

For the example OOD image `imgs/DTD_cracked_0004.jpg`, you are supposed to get:
```
MSP  evaluation:   out-of-distribution 
MaxLogit  evaluation:   out-of-distribution 
Energy  evaluation:   out-of-distribution 
Energy+React  evaluation:   out-of-distribution 
ViM  evaluation:   out-of-distribution 
Residual  evaluation:   out-of-distribution 
GradNorm  evaluation:   out-of-distribution 
Mahalanobis  evaluation:   out-of-distribution
```

## Benchmark
To reproduce the results in our paper, please refer to our [repository](https://github.com/dvlab-research/MOOD) for details.