HuggingFaceM4
/

idefics2-8b-base

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

VictorSanh commited on May 6, 2024

Commit

2a8ad32

·

verified ·

1 Parent(s): b3e370b

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -51,7 +51,8 @@ We release under the Apache 2.0 license 2 checkpoints:
 - **Resources for more information:**
     - Description of [OBELICS](https://huggingface.co/datasets/HuggingFaceM4/OBELICS): [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
 ](https://huggingface.co/papers/2306.16527)
-    - Paper: Coming soon
 # Uses
@@ -439,6 +440,15 @@ The model is built on top of two pre-trained models: [google/siglip-so400m-patch
       archivePrefix={arXiv},
       primaryClass={cs.IR}
 }
 ```
 # Acknowledgements

 - **Resources for more information:**
     - Description of [OBELICS](https://huggingface.co/datasets/HuggingFaceM4/OBELICS): [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
 ](https://huggingface.co/papers/2306.16527)
+    - Paper: [What matters when building vision-language models?
+](https://huggingface.co/papers/2405.02246)
 # Uses
       archivePrefix={arXiv},
       primaryClass={cs.IR}
 }
+@misc{laurençon2024matters,
+      title={What matters when building vision-language models?},
+      author={Hugo Laurençon and Léo Tronchon and Matthieu Cord and Victor Sanh},
+      year={2024},
+      eprint={2405.02246},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
 ```
 # Acknowledgements