File size: 2,645 Bytes
11c16d6 c28b362 11c16d6 df82fcd 11c16d6 149f17e 11c16d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
tags:
- dino
- vision
datasets:
- imagenet-1k
---
# DINO ResNet-50
ResNet-50 pretrained with DINO. DINO was introduced in [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294), while ResNet was introduced in [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). The official implementation of a DINO Resnet-50 can be found [here](https://github.com/facebookresearch/dino).
Weights converted from the official [DINO ResNet](https://github.com/facebookresearch/dino#pretrained-models-on-pytorch-hub) using [this script](https://colab.research.google.com/drive/1Ax3IDoFPOgRv4l7u6uS8vrPf4TX827BK?usp=sharing).
For up-to-date model card information, please see the [original repo](https://github.com/facebookresearch/dino).
### How to use
**Warning: The feature extractor in this repo is a copy of the one from [`microsoft/resnet-50`](https://huggingface.co/microsoft/resnet-50). We never verified if this image prerprocessing is the one used with DINO ResNet-50.**
```python
from transformers import AutoFeatureExtractor, ResNetModel
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = AutoFeatureExtractor.from_pretrained('Ramos-Ramos/dino-resnet-50')
model = ResNetModel.from_pretrained('Ramos-Ramos/dino-resnet-50')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
```
### BibTeX entry and citation info
```bibtex
@article{DBLP:journals/corr/abs-2104-14294,
author = {Mathilde Caron and
Hugo Touvron and
Ishan Misra and
Herv{\'{e}} J{\'{e}}gou and
Julien Mairal and
Piotr Bojanowski and
Armand Joulin},
title = {Emerging Properties in Self-Supervised Vision Transformers},
journal = {CoRR},
volume = {abs/2104.14294},
year = {2021},
url = {https://arxiv.org/abs/2104.14294},
archivePrefix = {arXiv},
eprint = {2104.14294},
timestamp = {Tue, 04 May 2021 15:12:43 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-14294.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
``` |