Spaces:
Running
Running
Silvia Terragni
commited on
Commit
·
c702c34
1
Parent(s):
1ed0e02
simple fixes on introduction.md
Browse files- introduction.md +7 -6
introduction.md
CHANGED
@@ -29,10 +29,10 @@ is going to compute the similarity between the image and each label. The webapp
|
|
29 |
|
30 |
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
|
31 |
We indeed worked in a **low-resource setting**. The only datasets for Italian captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
|
32 |
-
To get competitive results we followed three strategies:
|
33 |
-
1. more and better data;
|
34 |
-
2. better augmentations;
|
35 |
-
3. better training.
|
36 |
|
37 |
## More and Better Data
|
38 |
|
@@ -80,10 +80,10 @@ Our implementation is available online [here](https://github.com/clip-italian/cl
|
|
80 |
|
81 |
### Backbone Freezing
|
82 |
|
83 |
-
The ViT used by OpenAI was already trained on
|
84 |
The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged we unfreezed the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
|
85 |
|
86 |
-
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="
|
87 |
|
88 |
# Scientific Validity
|
89 |
|
@@ -166,6 +166,7 @@ And what about "two cats"?
|
|
166 |
### Complex Queries
|
167 |
Have you ever seen "two brown horses"?
|
168 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
|
|
|
169 |
And finally, here's a very nice "cat on a chair"
|
170 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
|
171 |
|
|
|
29 |
|
30 |
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
|
31 |
We indeed worked in a **low-resource setting**. The only datasets for Italian captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
|
32 |
+
To get competitive results we followed three strategies:
|
33 |
+
1. more and better data;
|
34 |
+
2. better augmentations;
|
35 |
+
3. better training.
|
36 |
|
37 |
## More and Better Data
|
38 |
|
|
|
80 |
|
81 |
### Backbone Freezing
|
82 |
|
83 |
+
The ViT used by OpenAI was already trained on 400 million images and it is the element in our architecture that probably required less training.
|
84 |
The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged we unfreezed the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
|
85 |
|
86 |
+
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="80%"/>
|
87 |
|
88 |
# Scientific Validity
|
89 |
|
|
|
166 |
### Complex Queries
|
167 |
Have you ever seen "two brown horses"?
|
168 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
|
169 |
+
|
170 |
And finally, here's a very nice "cat on a chair"
|
171 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
|
172 |
|