LensAI Logo

Adversarial CLIP ViT Base Patch32 Fine-Tuned on PatchCamelyon (PCAM)

Overview

-This repository contains a model trained on adversarial data of the CLIP ViT Base Patch32 finetuned model on the PatchCamelyon (PCAM) dataset and also on PatchCamelyon Adversarial(PCAM) dataset.The model is optimized for histopathological image classification.

πŸ“Œ Model Highlights

  • Model Type: CLIP Vision Transformer (ViT-B/32) with classification head
  • Task: Binary classification of histopathological images (cancer vs. non-cancer)
  • Base Model: openai/clip-vit-base-patch32
  • Training Data: PatchCamelyon (PCAM) and Adversarial PCAM datasets
  • Input: RGB images (224x224 pixels)
  • Output: Binary classification (cancer/non-cancer)

πŸš€ Key Results

βœ… Clean Evaluation Metrics

  • Clean Accuracy: 86.72%

βš”οΈ Adversarial Robustness (Fine-tuned Model)

  • PGD Attack:
    • Success Rate: 17.87%
    • Average L2 Distance: 12.09
  • FGSM Attack:
    • Success Rate: 17.38%
    • Average L2 Distance: 12.10
  • DeepFool Attack:
    • Success Rate: 35.62%
    • Average L2 Distance: 234.13

πŸ“Š Base Model Comparison

  • Clean Accuracy: 86.30%
  • PGD: 50.10% Success Rate | Avg L2 Distance: 12.08
  • FGSM: 44.14% Success Rate | Avg L2 Distance: 12.10
  • DeepFool: 81.64% Success Rate | Avg L2 Distance: 224.66

Hardware: Trained on NVIDIA A100 GPU (5 epochs)


πŸ”§ Usage

Installation

pip install transformers torch safetensors

Inference Example

from transformers import CLIPVisionConfig, CLIPVisionModel, CLIPFeatureExtractor
import torch
from torch import nn

class PCamClassifier(nn.Module):
    def __init__(self, config_dict):
        super().__init__()
        self.config = CLIPVisionConfig(**config_dict)
        self.vision_model = CLIPVisionModel(self.config)
        self.classifier = nn.Linear(self.config.hidden_size, 2)

    def forward(self, pixel_values):
        outputs = self.vision_model(pixel_values)
        return self.classifier(outputs.pooler_output)

# Load model
config_dict = {
    "_name_or_path": "openai/clip-vit-base-patch32",
    "architectures": ["CLIPVisionModel"],
    "attention_dropout": 0.0,
    "dropout": 0.0,
    "hidden_act": "quick_gelu",
    "hidden_size": 768,
    "image_size": 224,
    "initializer_factor": 1.0,
    "initializer_range": 0.02,
    "intermediate_size": 3072,
    "layer_norm_eps": 1e-05,
    "model_type": "clip_vision_model",
    "num_attention_heads": 12,
    "num_channels": 3,
    "num_hidden_layers": 12,
    "patch_size": 32,
    "projection_dim": 512,
    "torch_dtype": "float32"
}

# Initialize model
model = PCamClassifier(config_dict)
model.load_state_dict(torch.load('best_enhanced_pcam_model.pt'))


class PCamDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
        
    def __len__(self):
        return len(self.dataset)
        
    def __getitem__(self, idx):
        example = self.dataset[idx]
        image = example["image"].convert("RGB")
        image_array = np.array(image) / 255.0
        image_array = image_array.transpose(2, 0, 1).astype(np.float32)
        return {
            "pixel_values": image_array,
            "labels": example["label"]
        }

πŸ“Š Future Work

We plan to release:

  • Enhanced robustness metrics
  • Expanded adversarial attack evaluations

πŸ“œ License

Released under the Apache-2.0 License.

πŸ“¬ Contact

For inquiries, please reach out to Venkata Tej at LensAI.

Downloads last month
2
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Datasets used to train lens-ai/adversarial-clip-vit-base-patch32_pcam_finetuned