laicsiifes
/

swin-distilbertimbau

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

gabrielmotablima commited on Sep 2, 2024

Commit

a27b459

·

verified ·

1 Parent(s): 2535abf

update readme

Files changed (1) hide show

README.md +31 -5

README.md CHANGED Viewed

@@ -14,26 +14,52 @@ base_model:
 pipeline_tag: text-generation
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Description
-<!-- Provide a longer summary of what this model is. -->
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
 ### Results
-[More Information Needed]
 **BibTeX:**

 pipeline_tag: text-generation
 ---
+# Swin-DistilBERTimbau
+**Swin-DistilBERTimbau** model trained on **Flickr30K Portuguese** (translated version using Google Translator API)
+at resolution 224x224 and max sequence length of 512 tokens.
 ## Model Description
+The Swin-DistilBERTimbau is a type of Vision Encoder Decoder which leverage the checkpoints of the [Swin Trnasformer](https://huggingface.co/microsoft/swin-base-patch4-window7-224)
+as encoder and the checkpoints of the [DistilBERTimbau](https://huggingface.co/adalbertojunior/distilbert-portuguese-cased) as decoder.
+The encoder checkpoints come from Swin Trasnformer version pre-trained on ImageNet-1k at resolution 224x224.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+import requests
+from PIL import Image
+from transformers import AutoTokenizer, ViTImageProcessor, VisionEncoderDecoderModel
+# load a fine-tuned image captioning model and corresponding tokenizer and image processor
+model = VisionEncoderDecoderModel.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
+tokenizer = GPT2TokenizerFast.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
+image_processor = ViTImageProcessor.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
+# perform inference on an image
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+pixel_values = image_processor(image, return_tensors="pt").pixel_values
+# generate caption
+generated_ids = model.generate(pixel_values)
+generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(generated_text)
+```
 ### Results
+|Model|Training|Evaluation|Cider-D|BLEU@4|ROUGE-L|METEOR|BERTScore|
+|-----|--------|----------|-------|------|-------|------|---------|
+|Swin-DistilBERTimbau|Flickr30K Portuguese|Flickr30K Portuguese|66.73|24.65|39.98|44.71|72.30|
+|Swin-GPT-2|Flickr30K Portuguese|Flickr30K Portuguese|64.71|23.15|39.39|44.36|71.70|
 **BibTeX:**