AIMv2-Large-Patch14-Native Image Classification
This repository contains an adapted version of the original AIMv2 model, modified to be compatible with the AutoModelForImageClassification
class from Hugging Face Transformers. This adaptation enables seamless use of the model for image classification tasks.
This model has not been trained/fine-tuned
Introduction
We have adapted the original apple/aimv2-large-patch14-native
model to work with AutoModelForImageClassification
. The AIMv2 family consists of vision models pre-trained with a multimodal autoregressive objective, offering robust performance across various benchmarks.
Some highlights of the AIMv2 models include:
- Outperforming OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks.
- Surpassing DINOv2 in open-vocabulary object detection and referring expression comprehension.
- Demonstrating strong recognition performance, with AIMv2-3B achieving 89.5% on ImageNet using a frozen trunk.
Usage
PyTorch
import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForImageClassification
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
processor = AutoImageProcessor.from_pretrained(
"amaye15/aimv2-large-patch14-native-image-classification",
)
model = AutoModelForImageClassification.from_pretrained(
"amaye15/aimv2-large-patch14-native-image-classification",
trust_remote_code=True,
)
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
# Get predicted class
predictions = outputs.logits.softmax(dim=-1)
predicted_class = predictions.argmax(-1).item()
print(f"Predicted class: {model.config.id2label[predicted_class]}")
Model Details
- Model Name:
amaye15/aimv2-large-patch14-native-image-classification
- Original Model:
apple/aimv2-large-patch14-native
- Adaptation: Modified to be compatible with
AutoModelForImageClassification
for direct use in image classification tasks. - Framework: PyTorch
Citation
If you use this model or find it helpful, please consider citing the original AIMv2 paper:
@article{yang2023aimv2,
title={AIMv2: Advances in Multimodal Vision Models},
author={Yang, Li and others},
journal={arXiv preprint arXiv:2411.14402},
year={2023}
}
- Downloads last month
- 83
Inference API (serverless) does not yet support model repos that contain custom code.
Model tree for amaye15/aimv2-large-patch14-native-image-classification
Base model
apple/aimv2-large-patch14-native