File size: 1,203 Bytes
9708fbc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
pipeline_tag: image-text-to-text
library_name: transformers
license: mit
---
# DiffCLIP: Differential Attention Meets CLIP
This repository contains the DiffCLIP model as presented in [DiffCLIP: Differential Attention Meets CLIP](https://huggingface.co/papers/2503.06626).
Project Page: https://hammoudhasan.github.io/DiffCLIP
Code: https://github.com/hammoudhasan/DiffCLIP
## How to Use
### Installation
```bash
# Clone the repository
git clone https://github.com/hammoudhasan/DiffCLIP.git
cd DiffCLIP
# Install dependencies
pip install -r requirements.txt
```
### Basic Usage
```python
import torch
from diff_clip import DiffCLIP_VITB16
# Create model
model = DiffCLIP_VITB16()
# Process image and text
image = torch.randn(1, 3, 224, 224)
text = torch.randint(0, 49408, (1, 77)) # Tokenized text
# Get embeddings
with torch.no_grad():
outputs = model(image, text)
print(outputs["image_embed"].shape) # Should be [1, 512]
print(outputs["text_embed"].shape) # Should be [1, 512]
```
### Zero-Shot Classification
You can use the provided `test_models.py` script to perform zero-shot classification. See the [GitHub README](https://github.com/hammoudhasan/DiffCLIP) for details. |