laion/CLIP-ViT-g-14-laion2B-s34B-b88K · What is difference between this and CLIP-ViT-g-14-laion2B-s12B-b42K and CLIP-ViT-bigG-14-laion2B-39B-b160k

Apr 11, 2023

Can you add readme with difference?

LAION eV org Apr 12, 2023

@AlexWortega it's a different version of https://huggingface.co/laion/CLIP-ViT-g-14-laion2B-s12B-b42K not bigG (larger model), trained on same data but for more samples and w/ a larger global batch size

rwightman

LAION eV org Apr 12, 2023

•

edited Apr 12, 2023

s34B means 34B samples seen during training, b88K means a global batch size of ~88000 instead of 12B samples and ~42000 batch size for the earlier 'g'

From https://github.com/mlfoundations/open_clip README, this is the 78.5. Should update the README here yes, but there's a big stack of TODOs :)

ViT-H/14 on LAION-2B with an accuracy of 78.0. The second best in1k zero-shot for released, open-source weights thus far.
ViT-g/14 on LAION-2B with an accuracy of 76.6. This was trained on reduced 12B samples seen schedule, same samples seen as 400M models.
ViT-g/14 on LAION-2B with an accuracy of 78.5. Full 34B samples seen schedule.
ViT-G/14 on LAION-2B with an accuracy of 80.1. The best in1k zero-shot for released, open-source weights thus far.