|
<!DOCTYPE html> |
|
<html lang="en"> |
|
<head> |
|
<meta charset="UTF-8"> |
|
<title>Title</title> |
|
</head> |
|
<body> |
|
This is a demo of <a href="https://arxiv.org/pdf/2209.11228.pdf">NamedMask: Distilling Segmenters from Complementary Foundation Models</a>.</br> |
|
The goal of this work is to segment and name regions of images without access to pixel-level labels during training. |
|
To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. |
|
The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to image content but lacks an accessible representation of object structure. |
|
The second, DINO (Caron et al. 2021), captures the spatial extent of objects but has no knowledge of object names. |
|
Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. |
|
These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. |
|
Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these archives with appropriate data augmentation achieves impressive semantic segmentation abilities for both single-object and multi-object images. |
|
As a result, our proposed NamedMask performs favourably against a range of prior work on five benchmarks including the VOC2012, COCO and large-scale ImageNet-S datasets. |
|
Code is publicly available at <a href="https://github.com/NoelShin/namedmask">our repo</a>. |
|
</body> |
|
</html> |