|
# ruclip-vit-large-patch14-336 |
|
|
|
**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model |
|
for obtaining images and text similarities and rearranging captions and pictures. |
|
RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and |
|
multimodal learning. |
|
|
|
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams. |
|
* Task: `text ranking`; `image ranking`; `zero-shot image classification`; |
|
* Type: `encoder` |
|
* Num Parameters: `430M` |
|
* Training Data Volume: `240 million text-image pairs` |
|
* Language: `Russian` |
|
* Context Length: `77` |
|
* Transformer Layers: `12` |
|
* Transformer Width: `768` |
|
* Transformer Heads: `12` |
|
* Image Size: `336` |
|
* Vision Layers: `24` |
|
* Vision Width: `1024` |
|
* Vision Patch Size: `14` |
|
|
|
## Usage [Github](https://github.com/sberbank-ai/ru-clip) |
|
|
|
``` |
|
pip install ruclip |
|
``` |
|
|
|
```python |
|
clip, processor = ruclip.load("ruclip-vit-large-patch14-336", device="cuda") |
|
``` |
|
|
|
## Performance |
|
We have evaluated the performance on the following datasets: |
|
|
|
| Dataset | Metric Name | Metric Result | |
|
|:--------------|:---------------|:--------------------| |
|
| Food101 | acc | 0.712 | |
|
| CIFAR10 | acc | 0.906 | |
|
| CIFAR100 | acc | 0.591 | |
|
| Birdsnap | acc | 0.213 | |
|
| SUN397 | acc | 0.523 | |
|
| Stanford Cars | acc | 0.659 | |
|
| DTD | acc | 0.408 | |
|
| MNIST | acc | 0.242 | |
|
| STL10 | acc | 0.956 | |
|
| PCam | acc | 0.554 | |
|
| CLEVR | acc | 0.142 | |
|
| Rendered SST2 | acc | 0.539 | |
|
| ImageNet | acc | 0.488 | |
|
| FGVC Aircraft | mean-per-class | 0.075 | |
|
| Oxford Pets | mean-per-class | 0.546 | |
|
| Caltech101 | mean-per-class | 0.835 | |
|
| Flowers102 | mean-per-class | 0.517 | |
|
| HatefulMemes | roc-auc | 0.519 | |
|
|
|
|
|
# Authors |
|
|
|
+ Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov) |
|
+ Daniil Chesakov: [Github](https://github.com/Danyache) |
|
+ Denis Dimitrov: [Github](https://github.com/denndimitrov) |
|
+ Igor Pavlov: [Github](https://github.com/boomb0om) |
|
|