File size: 1,834 Bytes
802b465
 
f5a7df7
802b465
 
36a0697
 
 
 
 
802b465
df16fb1
802b465
df16fb1
802b465
df16fb1
802b465
df16fb1
802b465
df16fb1
802b465
df16fb1
7589a7b
 
 
 
 
 
 
 
802b465
7589a7b
 
 
802b465
7589a7b
 
 
 
802b465
df16fb1
802b465
f5a7df7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
library_name: transformers
pipeline_tag: image-feature-extraction
---

# Disclaimer 

This belongs to Microsoft Florence 2, all I have done is taken the weights and modified the code to be compatiable with Huggingface pretrained models. The reason why is I want to use Florence 2 and it's components with the Hugginface framework and ONNX.

## DaViT Model

This repository contains the implementation of the DaViT (Dual-Attention Vision Transformer) model for image classification tasks. The model leverages dual attention mechanisms to improve performance on various image datasets.

## Model Description

DaViT (Dual-Attention Vision Transformer) is designed to handle image classification tasks effectively. It combines spatial and channel attention mechanisms to capture intricate details in images. The model has multiple stages, each with convolutional embeddings and attention blocks.

### Example

Here is an example of how to use the DaViT model for image classification:

```python
# Load model directly
from transformers import AutoModel, AutoProcessor
from PIL import Image
import requests

model = AutoModel.from_pretrained("amaye15/DaViT-Florence-2-large-ft", trust_remote_code=True, cache_dir=os.getcwd())
processor = AutoProcessor.from_pretrained("amaye15/DaViT-Florence-2-large-ft", trust_remote_code=True, cache_dir=os.getcwd())


prompt = "<OCR>"
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt")

model(inputs["pixel_values"])
```

## Credits

This model is inspired by and builds upon the ideas presented in the [Florence-2-large model by Microsoft](https://huggingface.co/microsoft/Florence-2-large).