Adversarial CLIP ViT Base Patch32 Fine-Tuned on PatchCamelyon (PCAM)
Overview
-This repository contains a model trained on adversarial data of the CLIP ViT Base Patch32 finetuned model on the PatchCamelyon (PCAM) dataset and also on PatchCamelyon Adversarial(PCAM) dataset.The model is optimized for histopathological image classification.
π Model Highlights
- Model Type: CLIP Vision Transformer (ViT-B/32) with classification head
- Task: Binary classification of histopathological images (cancer vs. non-cancer)
- Base Model:
openai/clip-vit-base-patch32
- Training Data: PatchCamelyon (PCAM) and Adversarial PCAM datasets
- Input: RGB images (224x224 pixels)
- Output: Binary classification (cancer/non-cancer)
π Key Results
β Clean Evaluation Metrics
- Clean Accuracy: 86.72%
βοΈ Adversarial Robustness (Fine-tuned Model)
- PGD Attack:
- Success Rate: 17.87%
- Average L2 Distance: 12.09
- FGSM Attack:
- Success Rate: 17.38%
- Average L2 Distance: 12.10
- DeepFool Attack:
- Success Rate: 35.62%
- Average L2 Distance: 234.13
π Base Model Comparison
- Clean Accuracy: 86.30%
- PGD: 50.10% Success Rate | Avg L2 Distance: 12.08
- FGSM: 44.14% Success Rate | Avg L2 Distance: 12.10
- DeepFool: 81.64% Success Rate | Avg L2 Distance: 224.66
Hardware: Trained on NVIDIA A100 GPU (5 epochs)
π§ Usage
Installation
pip install transformers torch safetensors
Inference Example
from transformers import CLIPVisionConfig, CLIPVisionModel, CLIPFeatureExtractor
import torch
from torch import nn
class PCamClassifier(nn.Module):
def __init__(self, config_dict):
super().__init__()
self.config = CLIPVisionConfig(**config_dict)
self.vision_model = CLIPVisionModel(self.config)
self.classifier = nn.Linear(self.config.hidden_size, 2)
def forward(self, pixel_values):
outputs = self.vision_model(pixel_values)
return self.classifier(outputs.pooler_output)
# Load model
config_dict = {
"_name_or_path": "openai/clip-vit-base-patch32",
"architectures": ["CLIPVisionModel"],
"attention_dropout": 0.0,
"dropout": 0.0,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"model_type": "clip_vision_model",
"num_attention_heads": 12,
"num_channels": 3,
"num_hidden_layers": 12,
"patch_size": 32,
"projection_dim": 512,
"torch_dtype": "float32"
}
# Initialize model
model = PCamClassifier(config_dict)
model.load_state_dict(torch.load('best_enhanced_pcam_model.pt'))
class PCamDataset(Dataset):
def __init__(self, dataset):
self.dataset = dataset
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
example = self.dataset[idx]
image = example["image"].convert("RGB")
image_array = np.array(image) / 255.0
image_array = image_array.transpose(2, 0, 1).astype(np.float32)
return {
"pixel_values": image_array,
"labels": example["label"]
}
π Future Work
We plan to release:
- Enhanced robustness metrics
- Expanded adversarial attack evaluations
π License
Released under the Apache-2.0 License.
π¬ Contact
For inquiries, please reach out to Venkata Tej at LensAI.
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.