Spaces:
Running
Running
Merge branch 'main' of https://huggingface.co/spaces/clip-italian/clip-italian-demo into main
Browse files- introduction.md +6 -3
introduction.md
CHANGED
@@ -66,12 +66,11 @@ We considered four main sources of data:
|
|
66 |
[Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)). We focused on the *Reference Description* captions
|
67 |
described in the paper as they are the ones of highest quality. Nonetheless, many of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
|
68 |
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
69 |
-
|
70 |
-
are still good (e.g., "running dog"). Thus, to prevent polluting the data with captions that are not meaningful, we used *POS tagging*
|
71 |
on the text and removed all the captions that were composed for the 80% or more by PROPN (around ~10% of the data). This is a simple solution that allowed us to retain much
|
72 |
of the dataset, without introducing noise.
|
73 |
|
74 |
-
Captions like
|
75 |
|
76 |
+ [MSCOCO-IT](https://github.com/crux82/mscoco-it). This image-caption dataset comes from the work by [Scaiella et al., 2019](http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf). The captions come from the original
|
77 |
MSCOCO dataset and have been translated with Microsoft Translator. The 2017 version of the MSCOCO training set contains more than
|
@@ -265,16 +264,20 @@ Look at the following - slightly cherry picked - examples:
|
|
265 |
|
266 |
### Colors
|
267 |
Here's "a yellow flower"
|
|
|
268 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="500"/>
|
269 |
|
270 |
And here's "a blue flower"
|
|
|
271 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="500"/>
|
272 |
|
273 |
### Counting
|
274 |
What about "one cat"?
|
|
|
275 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="500"/>
|
276 |
|
277 |
And what about "two cats"?
|
|
|
278 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="500"/>
|
279 |
|
280 |
### Complex Queries
|
|
|
66 |
[Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)). We focused on the *Reference Description* captions
|
67 |
described in the paper as they are the ones of highest quality. Nonetheless, many of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
|
68 |
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
69 |
+
To prevent polluting the data with captions that are not meaningful, we used *POS tagging*
|
|
|
70 |
on the text and removed all the captions that were composed for the 80% or more by PROPN (around ~10% of the data). This is a simple solution that allowed us to retain much
|
71 |
of the dataset, without introducing noise.
|
72 |
|
73 |
+
Captions like *'Dora Riparia', 'Anna Maria Mozzoni', 'Joey Ramone Place', 'Kim Rhodes', 'Ralph George Hawtrey' * have been removed.
|
74 |
|
75 |
+ [MSCOCO-IT](https://github.com/crux82/mscoco-it). This image-caption dataset comes from the work by [Scaiella et al., 2019](http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf). The captions come from the original
|
76 |
MSCOCO dataset and have been translated with Microsoft Translator. The 2017 version of the MSCOCO training set contains more than
|
|
|
264 |
|
265 |
### Colors
|
266 |
Here's "a yellow flower"
|
267 |
+
|
268 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="500"/>
|
269 |
|
270 |
And here's "a blue flower"
|
271 |
+
|
272 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="500"/>
|
273 |
|
274 |
### Counting
|
275 |
What about "one cat"?
|
276 |
+
|
277 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="500"/>
|
278 |
|
279 |
And what about "two cats"?
|
280 |
+
|
281 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="500"/>
|
282 |
|
283 |
### Complex Queries
|