VictorSanh
commited on
Commit
•
ee0a04d
1
Parent(s):
25607c2
cleanign
Browse files
README.md
CHANGED
@@ -271,7 +271,6 @@ For ImageNet-1k, we also report results where the priming samples are selected t
|
|
271 |
|
272 |
Similarly to the base IDEFICS models, we performed checkpoint selection to stop the training. Given that M3IT contains in the training set a handful of the benchmarks we were evaluating on, we used [MMBench](https://huggingface.co/papers/2307.06281) as a held-out validation benchmark to perform checkpoint selection. We select the checkpoint at step 3'000 for IDEFICS-80b-instruct and at step 8'000 for IDEFICS-9b-instruct.
|
273 |
|
274 |
-
Idefics Instruct Evaluations:
|
275 |
| Model | Shots | <nobr>VQAv2 <br>OE VQA acc.</nobr> | <nobr>OKVQA <br>OE VQA acc.</nobr> | <nobr>TextVQA <br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps <br>CIDEr</nobr> | <nobr>Coco <br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial <br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA <br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
276 |
| :--------------------- | --------: | ---------------------: | ---------------------: | -----------------------: | ----------------------: | -------------------: | ---------------: | -----------------: | -----------------: | -----------------: | -------------------------: | -----------------------: | --------------------------: | ----------------------------------: |
|
277 |
| Finetuning data does not contain dataset | - | ❌ | ❌ | ❌ | ✔ | ❌ | ❌ | ✔ | ✔ | ❌ | ✔ | ❌ | ✔ | ❌ |
|
@@ -287,8 +286,6 @@ Idefics Instruct Evaluations:
|
|
287 |
| | 16 | 66.8 (9.8) | 51.7 (3.3) | 31.6 (3.7) | 44.8 (2.3) | 70.2 (2.7) | 128.8 (29.1) | 101.5 (12.2) | 75.8 (11.4) | - | 51.7 (0.7) | - | 63.3 (-4.6) | - |
|
288 |
| | 32 | 66.9 (9.0) | 52.3 (2.7) | 32.0 (3.7) | 46.0 (2.2) | 71.7 (3.6) | 127.8 (29.8) | 101.0 (10.5) | 76.3 (11.9) | - | 50.8 (1.0) | - | 60.9 (-6.1) | - |
|
289 |
|
290 |
-
*Starred evaluations are also part of the intsruction data mix
|
291 |
-
|
292 |
|
293 |
# Technical Specifications
|
294 |
|
|
|
271 |
|
272 |
Similarly to the base IDEFICS models, we performed checkpoint selection to stop the training. Given that M3IT contains in the training set a handful of the benchmarks we were evaluating on, we used [MMBench](https://huggingface.co/papers/2307.06281) as a held-out validation benchmark to perform checkpoint selection. We select the checkpoint at step 3'000 for IDEFICS-80b-instruct and at step 8'000 for IDEFICS-9b-instruct.
|
273 |
|
|
|
274 |
| Model | Shots | <nobr>VQAv2 <br>OE VQA acc.</nobr> | <nobr>OKVQA <br>OE VQA acc.</nobr> | <nobr>TextVQA <br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps <br>CIDEr</nobr> | <nobr>Coco <br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial <br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA <br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
275 |
| :--------------------- | --------: | ---------------------: | ---------------------: | -----------------------: | ----------------------: | -------------------: | ---------------: | -----------------: | -----------------: | -----------------: | -------------------------: | -----------------------: | --------------------------: | ----------------------------------: |
|
276 |
| Finetuning data does not contain dataset | - | ❌ | ❌ | ❌ | ✔ | ❌ | ❌ | ✔ | ✔ | ❌ | ✔ | ❌ | ✔ | ❌ |
|
|
|
286 |
| | 16 | 66.8 (9.8) | 51.7 (3.3) | 31.6 (3.7) | 44.8 (2.3) | 70.2 (2.7) | 128.8 (29.1) | 101.5 (12.2) | 75.8 (11.4) | - | 51.7 (0.7) | - | 63.3 (-4.6) | - |
|
287 |
| | 32 | 66.9 (9.0) | 52.3 (2.7) | 32.0 (3.7) | 46.0 (2.2) | 71.7 (3.6) | 127.8 (29.8) | 101.0 (10.5) | 76.3 (11.9) | - | 50.8 (1.0) | - | 60.9 (-6.1) | - |
|
288 |
|
|
|
|
|
289 |
|
290 |
# Technical Specifications
|
291 |
|