SanghyukChun
/

PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M

pytorch_model_hub_mixin

model_hub_mixin

Inference Endpoints

Model card Files Files and versions Community

SanghyukChun commited on May 26

Commit

be6e736

•

1 Parent(s): e5c6184

Push model using huggingface_hub.

Files changed (3) hide show

README.md +6 -50
config.json +18 -0
model.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,53 +1,9 @@
 ---
-license: mit
 ---
-### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
-Zero-shot ImageNet-1k top-1 accuracy: 41.812% (with longer training iterations than the previous version)
-- Paper: https://openreview.net/forum?id=ft1mr3WlGM
-- GitHub: https://github.com/naver-ai/pcmepp
-- Check the official version with ImageNet-1k top-1 accuracy 34.642% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps)
-```python
-import requests
-from PIL import Image
-import torch
-from transformers import CLIPProcessor
-# Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
-from hf_models import HfPCMEPPModel, tokenize
-processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
-# IN-top1: 34.64%
-# model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
-# IN-top1: 41.81%
-model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
-url = "http://images.cocodataset.org/val2017/000000039769.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
-inputs = processor(images=image, return_tensors="pt", padding=True)
-texts = ["a photo of a cat", "a photo of a dog"]
-texts = tokenize(texts)
-outputs = model(images=inputs["pixel_values"], texts=texts)
-print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
-print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
-print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
-```
-```
-@inproceedings{
-chun2024pcmepp,
-title={Improved Probabilistic Image-Text Representations},
-author={Sanghyuk Chun},
-booktitle={The Twelfth International Conference on Learning Representations},
-year={2024},
-url={https://openreview.net/forum?id=ft1mr3WlGM}
-}
-```

 ---
+tags:
+- pytorch_model_hub_mixin
+- model_hub_mixin
 ---
+This model has been pushed to the Hub using ****:
+- Repo: [More Information Needed]
+- Docs: [More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "embed_dim": 512,
+  "text_cfg": {
+    "context_length": 77,
+    "heads": 8,
+    "layers": 12,
+    "unc_layers": 2,
+    "vocab_size": 49408,
+    "width": 512
+  },
+  "vision_cfg": {
+    "image_size": 224,
+    "layers": 12,
+    "patch_size": 16,
+    "unc_layers": 2,
+    "width": 768
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:751c23b80a2b269a7b7c235636f8523b0ca4f5ca29dc0b725fd5b991787a5e10
+size 683066852