gabrielmotablima commited on
Commit
a27b459
·
verified ·
1 Parent(s): 2535abf

update readme

Browse files
Files changed (1) hide show
  1. README.md +31 -5
README.md CHANGED
@@ -14,26 +14,52 @@ base_model:
14
  pipeline_tag: text-generation
15
  ---
16
 
17
- # Model Card for Model ID
18
 
19
- <!-- Provide a quick summary of what the model is/does. -->
 
20
 
21
 
22
  ## Model Description
23
 
24
- <!-- Provide a longer summary of what this model is. -->
 
 
25
 
26
 
27
  ## How to Get Started with the Model
28
 
29
  Use the code below to get started with the model.
30
 
31
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
 
34
  ### Results
35
 
36
- [More Information Needed]
 
 
 
37
 
38
  **BibTeX:**
39
 
 
14
  pipeline_tag: text-generation
15
  ---
16
 
17
+ # Swin-DistilBERTimbau
18
 
19
+ **Swin-DistilBERTimbau** model trained on **Flickr30K Portuguese** (translated version using Google Translator API)
20
+ at resolution 224x224 and max sequence length of 512 tokens.
21
 
22
 
23
  ## Model Description
24
 
25
+ The Swin-DistilBERTimbau is a type of Vision Encoder Decoder which leverage the checkpoints of the [Swin Trnasformer](https://huggingface.co/microsoft/swin-base-patch4-window7-224)
26
+ as encoder and the checkpoints of the [DistilBERTimbau](https://huggingface.co/adalbertojunior/distilbert-portuguese-cased) as decoder.
27
+ The encoder checkpoints come from Swin Trasnformer version pre-trained on ImageNet-1k at resolution 224x224.
28
 
29
 
30
  ## How to Get Started with the Model
31
 
32
  Use the code below to get started with the model.
33
 
34
+ ```python
35
+ import requests
36
+ from PIL import Image
37
+
38
+ from transformers import AutoTokenizer, ViTImageProcessor, VisionEncoderDecoderModel
39
+
40
+ # load a fine-tuned image captioning model and corresponding tokenizer and image processor
41
+ model = VisionEncoderDecoderModel.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
42
+ tokenizer = GPT2TokenizerFast.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
43
+ image_processor = ViTImageProcessor.from_pretrained("laicsiifes/swin-distilbert-flickr30k-pt-br")
44
+
45
+ # perform inference on an image
46
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
47
+ image = Image.open(requests.get(url, stream=True).raw)
48
+ pixel_values = image_processor(image, return_tensors="pt").pixel_values
49
+
50
+ # generate caption
51
+ generated_ids = model.generate(pixel_values)
52
+ generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
53
+ print(generated_text)
54
+ ```
55
 
56
 
57
  ### Results
58
 
59
+ |Model|Training|Evaluation|Cider-D|BLEU@4|ROUGE-L|METEOR|BERTScore|
60
+ |-----|--------|----------|-------|------|-------|------|---------|
61
+ |Swin-DistilBERTimbau|Flickr30K Portuguese|Flickr30K Portuguese|66.73|24.65|39.98|44.71|72.30|
62
+ |Swin-GPT-2|Flickr30K Portuguese|Flickr30K Portuguese|64.71|23.15|39.39|44.36|71.70|
63
 
64
  **BibTeX:**
65