A leaderboard for multimodal models
Engage in multi-modal conversations with images and videos
Chat with images and text using Qwen-VL-Plus
Retrieve images using audio, text, or both
Answer questions about images
Generate images from text