4M: Massively Multimodal Masked Modeling
Generate images from text prompts
Compare different visual question answering
Convert screenshots to HTML code