File size: 2,482 Bytes
508e29b
 
52c9ac6
508e29b
52c9ac6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: mit
thumbnail: https://algolzw.github.io/daclip-uir/static/images/teaser.jpg
---


# Model Card: daclip-uir ViT-B/32 - irsde

## Model Details

### Model Description

This model extends the CLIP to a degradation-aware version (DA-CLIP) which predicts both degradation embedding and clean content embedding from corrupted images. Then we can use these embeddings to improve image restoration performance and help unified image restoration. The base CLIP model is pretrained ViT-B/32 and the base diffusion model for image restoration is [IR-SDE](https://arxiv.org/abs/2301.11699).


### Documents

Controlling Vision-Language Models for Universal Image Restoration - [paper](https://arxiv.org/abs/2310.01018).


### Intended Use

The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore image degradation with language models. Researchers in computer vision can use it to further improve their models' performance. We also encourage users who are interested in our work to train their own models with larger dataset and more degradation types.


### Performance

We have evaluated the performance of DA-CLIP and the downstream diffusion model on 10 different image restoration datasets:

- GoPro: Motion-blur
- RESIDE-6k: haze  
- LIVE1: JPEG-compress  
- LOL: Low-light
- CBSD68: Noisy
- RainDrop: Raindrop
- Rain100H: Rainy
- SRD: Shadowed
- Snow100K-L: Snowy
- CelebaHQ-256: Inpainting

### Limitations
The current pretrained model is still difficult to process some real-world images  which might have distribution shifts with our training dataset (captured from different devices or with different resolutions or degradations). We regard it as a future work and will try to make our model more practical! 
We also found that directly resizing input images will lead a poor performance for most tasks. We could try to add the resize step into the training but it always destroys the image quality due to interpolation.


#### Contact
If you have any question, please contact: [email protected]


### Citations
If our code helps your research or work, please consider citing our paper:

```
@article{luo2023controlling,
  title={Controlling Vision-Language Models for Universal Image Restoration},
  author={Luo, Ziwei and Gustafsson, Fredrik K and Zhao, Zheng and Sj{\"o}lund, Jens and Sch{\"o}n, Thomas B},
  journal={arXiv preprint arXiv:2310.01018},
  year={2023}
}
```