Vision Transformer (ViT) models for image classification converted to ggml format

Available models

Model Disk Mem SHA
tiny 12 MB ~20 MB 25ce65ff60e08a1a5b486685b533d79718e74c0f
small 45 MB ~52 MB 7a9f85340bd1a3dcd4275f46d5ee1db66649700e
base 174 MB ~179 MB a10d29628977fe27691edf55b7238f899b8c02eb
large 610 MB ~597 MB 5f27087930f21987050188f9dc9eea75ac607214

The models are pre-trained on ImageNet21k then finetuned on ImageNet1k with a patch size of 16 and an image size of 224.

For more information, visit:

https://github.com/staghado/vit.cpp

Downloads last month
32
GGUF

16-bit

Inference API
Unable to determine this model's library. Check the docs .