google
/

pix2struct-textcaps-large

image-text-to-text

Model card Files Files and versions Community

ybelkada commited on Mar 14, 2023

Commit

f49ed78

•

1 Parent(s): 9cd4270

Update README.md

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -168,3 +168,26 @@ print(processor.decode(predictions[0], skip_special_tokens=True))
 # Contribution
 This model was originally contributed by Kenton Lee, Mandar Joshi et al. and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada).

 # Contribution
 This model was originally contributed by Kenton Lee, Mandar Joshi et al. and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada).
+# Citation
+If you want to cite this work, please consider citing the original paper:
+```
+@misc{https://doi.org/10.48550/arxiv.2210.03347,
+  doi = {10.48550/ARXIV.2210.03347},
+  url = {https://arxiv.org/abs/2210.03347},
+  author = {Lee, Kenton and Joshi, Mandar and Turc, Iulia and Hu, Hexiang and Liu, Fangyu and Eisenschlos, Julian and Khandelwal, Urvashi and Shaw, Peter and Chang, Ming-Wei and Toutanova, Kristina},
+  keywords = {Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding},
+  publisher = {arXiv},
+  year = {2022},
+  copyright = {Creative Commons Attribution 4.0 International}
+}
+```