Generate captions for images in various styles
Convert GUI screen to structured elements
Generate masked images and answers based on queries