Describe images using questions
Identify and segment objects in images
Transcribe audio and identify background sounds