SanghyukChun commited on
Commit
e5c6184
1 Parent(s): 5c76b2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
6
+
7
+ Zero-shot ImageNet-1k top-1 accuracy: 41.812% (with longer training iterations than the previous version)
8
+
9
+ - Paper: https://openreview.net/forum?id=ft1mr3WlGM
10
+ - GitHub: https://github.com/naver-ai/pcmepp
11
+ - Check the official version with ImageNet-1k top-1 accuracy 34.642% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps)
12
+
13
+
14
+ ```python
15
+ import requests
16
+ from PIL import Image
17
+
18
+ import torch
19
+ from transformers import CLIPProcessor
20
+
21
+ # Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
22
+ from hf_models import HfPCMEPPModel, tokenize
23
+
24
+
25
+ processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
26
+ # IN-top1: 34.64%
27
+ # model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
28
+ # IN-top1: 41.81%
29
+ model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
30
+
31
+
32
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
33
+ image = Image.open(requests.get(url, stream=True).raw)
34
+ inputs = processor(images=image, return_tensors="pt", padding=True)
35
+ texts = ["a photo of a cat", "a photo of a dog"]
36
+ texts = tokenize(texts)
37
+
38
+ outputs = model(images=inputs["pixel_values"], texts=texts)
39
+ print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
40
+ print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
41
+ print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
42
+ ```
43
+
44
+ ```
45
+ @inproceedings{
46
+ chun2024pcmepp,
47
+ title={Improved Probabilistic Image-Text Representations},
48
+ author={Sanghyuk Chun},
49
+ booktitle={The Twelfth International Conference on Learning Representations},
50
+ year={2024},
51
+ url={https://openreview.net/forum?id=ft1mr3WlGM}
52
+ }
53
+ ```