amaye15
/

Florence-2-DaViT-large-ft

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

Florence-2-DaViT-large-ft / README.md

amaye15's picture

Update README.md

df16fb1 verified 3 months ago

|

No virus

1.72 kB

	---
	library_name: transformers
	tags: []
	---

	# DaViT Model

	This repository contains the implementation of the DaViT (Dual-Attention Vision Transformer) model for image classification tasks. The model leverages dual attention mechanisms to improve performance on various image datasets.

	## Model Description

	DaViT (Dual-Attention Vision Transformer) is designed to handle image classification tasks effectively. It combines spatial and channel attention mechanisms to capture intricate details in images. The model has multiple stages, each with convolutional embeddings and attention blocks.

	### Example

	Here is an example of how to use the DaViT model for image classification:

	```python
	import torch
	from transformers import AutoModel, AutoConfig
	# Load the configuration and model
	config = AutoConfig.from_pretrained("your-username/DaViT")
	model = AutoModel.from_pretrained("your-username/DaViT")
	# Generate a random sample input tensor with shape (batch_size, channels, height, width)
	batch_size = 2
	channels = 3
	height = 224
	width = 224
	sample_input = torch.randn(batch_size, channels, height, width)
	# Pass the sample input through the model
	output = model(sample_input)
	# Print the output shape
	print(f"Output shape: {output.shape}")
	```

	## Files

	- `configuration_davit.py`: Contains the `DaViTConfig` class.
	- `modeling_davit.py`: Contains the `DaViTModel` class.
	- `test_davit_model.py`: Script to test the model.
	- `config.json`: Configuration file for the model.
	- `model.safetensors`: Pretrained weights of the DaViT model.

	## Credits

	This model is inspired by and builds upon the ideas presented in the [Florence-2-large model by Microsoft](https://huggingface.co/microsoft/Florence-2-large).