ai-forever
/

ruclip-vit-large-patch14-336

Inference Endpoints

Model card Files Files and versions Community

ai-forever commited on Jan 9, 2022

Commit

42a0887

•

1 Parent(s): 8135a8f

Create README.md

Files changed (1) hide show

README.md +63 -0

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# ruclip-vit-large-patch14-336
+**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
+for obtaining images and text similarities and rearranging captions and pictures.
+RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
+multimodal learning.
+Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
+* Task: `text ranking`; `image ranking`; `zero-shot image classification`;
+* Type: `encoder`
+* Num Parameters: `430M`
+* Training Data Volume: `240 million text-image pairs`
+* Language: `Russian`
+* Context Length: `77`
+* Transformer Layers: `12`
+* Transformer Width: `768`
+* Transformer Heads: `12`
+* Image Size: `336`
+* Vision Layers: `24`
+* Vision Width: `1024`
+* Vision Patch Size: `14`
+## Usage [Github](https://github.com/sberbank-ai/ru-clip)
+```
+pip install ruclip
+```
+```python
+clip, processor = ruclip.load("ruclip-vit-large-patch14-336", device="cuda")
+```
+## Performance
+We have evaluated the performance on the following datasets:
+| Dataset       | Metric Name    | Metric Result       |
+|:--------------|:---------------|:--------------------|
+| Food101       | acc            | 0.712		      	   |
+| CIFAR10       | acc            | 0.906	             |
+| CIFAR100      | acc            | 0.591               |
+| Birdsnap      | acc            | 0.213               |
+| SUN397        | acc            | 0.523               |
+| Stanford Cars | acc            | 0.659               |
+| DTD           | acc            | 0.408	             |
+| MNIST         | acc            | 0.242	             |
+| STL10         | acc            | 0.956	             |
+| PCam          | acc            | 0.554               |
+| CLEVR         | acc            | 0.142               |
+| Rendered SST2 | acc            | 0.539               |
+| ImageNet      | acc            | 0.488               |
+| FGVC Aircraft | mean-per-class | 0.075               |
+| Oxford Pets   | mean-per-class | 0.546               |
+| Caltech101    | mean-per-class | 0.835               |
+| Flowers102    | mean-per-class | 0.517               |
+| HatefulMemes  | roc-auc        | 0.519               |
+# Authors
++ Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
++ Daniil Chesakov: [Github](https://github.com/Danyache)
++ Denis Dimitrov: [Github](https://github.com/denndimitrov)
++ Igor Pavlov: [Github](https://github.com/boomb0om)