Image Captioning Model created with VisionEncoderDecoderModel architecture using "microsoft/swinv2-base-patch4-window12to16-192to256-22kto1k-ft" as image_encoder and "openai/gpt2" as text_decoder. It has been trained on a variant of the WikiArt dataset that can be found at "AterMors/wikiart_recaption".

Downloads last month: 130

Safetensors

Model size

240M params

Tensor type

F32

Inference Providers NEW

Image-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AterMors/Swin2-GTP2_art-caption

Base model

microsoft/swinv2-base-patch4-window12to16-192to256-22kto1k-ft

Finetuned

(12)

this model

AterMors
/

Swin2-GTP2_art-caption

Model tree for AterMors/Swin2-GTP2_art-caption

Dataset used to train AterMors/Swin2-GTP2_art-caption