Spaces:
Running
Running
Update readme.md
Browse files
readme.md
CHANGED
@@ -1,25 +1,25 @@
|
|
1 |
# Italian CLIP
|
2 |
|
3 |
-
With a few tricks, we have been able to fine-tune a competitive CLIP
|
4 |
|
5 |
In building this project we kept in mind the following things:
|
6 |
|
7 |
-
+ **Novel Contributions**:
|
8 |
-
+ **Scientific Validity**:
|
9 |
+ **Broader Outlook**: we always considered which are the possible usages for this model.
|
10 |
|
11 |
-
We put our **hearts** and **souls**
|
12 |
-
able to
|
13 |
-
Thank you for this amazing opportunity, we hope you will like
|
14 |
|
15 |
# Novel Contributions
|
16 |
|
17 |
-
The original CLIP model was trained on
|
18 |
|
19 |
## More Data
|
20 |
|
21 |
We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
|
22 |
-
Thus, we
|
23 |
|
24 |
We considered three main sources of data:
|
25 |
|
@@ -67,7 +67,6 @@ We selected two different tasks:
|
|
67 |
+ image-retrieval
|
68 |
+ zero-shot classification
|
69 |
|
70 |
-
|
71 |
### Image Retrieval
|
72 |
|
73 |
| MRR | CLIP-Italian | mCLIP |
|
@@ -79,7 +78,7 @@ We selected two different tasks:
|
|
79 |
|
80 |
### Zero-shot classification
|
81 |
|
82 |
-
| Accuracy
|
83 |
| --------------- | ------------ |-------|
|
84 |
| Accuracy@1 | | |
|
85 |
| Accuracy@5 | | |
|
|
|
1 |
# Italian CLIP
|
2 |
|
3 |
+
With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with only 1.4 million training samples.
|
4 |
|
5 |
In building this project we kept in mind the following things:
|
6 |
|
7 |
+
+ **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and to our knowledge trained the best Italian CLIP model currently in existence;
|
8 |
+
+ **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. That's why we thoroughly evaluated our models and made the validation reproducible for everybody.
|
9 |
+ **Broader Outlook**: we always considered which are the possible usages for this model.
|
10 |
|
11 |
+
We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
|
12 |
+
able to make new friends and and learn a lot from each other to work towards a common goal!
|
13 |
+
Thank you for this amazing opportunity, we hope you will like the results. :heart:
|
14 |
|
15 |
# Novel Contributions
|
16 |
|
17 |
+
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT. To get competitive results we followed three strategies: 1) more data, 2) better augmentations and 3) better training.
|
18 |
|
19 |
## More Data
|
20 |
|
21 |
We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
|
22 |
+
Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
|
23 |
|
24 |
We considered three main sources of data:
|
25 |
|
|
|
67 |
+ image-retrieval
|
68 |
+ zero-shot classification
|
69 |
|
|
|
70 |
### Image Retrieval
|
71 |
|
72 |
| MRR | CLIP-Italian | mCLIP |
|
|
|
78 |
|
79 |
### Zero-shot classification
|
80 |
|
81 |
+
| Accuracy | CLIP-Italian | mCLIP |
|
82 |
| --------------- | ------------ |-------|
|
83 |
| Accuracy@1 | | |
|
84 |
| Accuracy@5 | | |
|