CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on 2.5 billion data points of CommonCrawl at resolution 224x224. It was introduced in the paper Learning Transferable Visual Models From Natural Language Supervision and further reproduced in the follow-up paper Demystifying CLIP Data. The weights were converted from the h14_fullcc2.5b.pt file presented in the original repository.

Downloads last month
4
Safetensors
Model size
986M params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including cs-giung/clip-vit-huge-patch14-fullcc2.5b