Generate customized images using text and an ID image
Generate text based on an image and prompt
Multimodal Language Model
More advanced and challenging multi-task evaluation
VLMEvalKit Evaluation Results Collection