Model Open Ended VQA: % Human Rating Multiple Choice VQA: % Accuracy Hints-Multiple Choice VQA: % Accuracy Attributions-Multiple Choice VQA: % Accuracy Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Automatic Evaluation: % Auto-Rater Ratings Hints-Automatic Evaluation: % Auto-Rater Ratings Attributions-Automatic Evaluation: % Auto-Rater Ratings Humans 82 * * * * * 78 * * Gemini Pro 1.5 40 38 66 72 87 52 53 62 29 Gemini Pro Vision 30 41 62 * 75 38 34 47 GPT4 34 45 69 82 86 51 38 61 25 LlaVA-1.6-34B 15 24 30 * 76 43 21 16 * LlaVA-1.5-7B 13 17 29 * 70 35 19 30 * InstructBlip 13 * * * * * 20 28 * Gemini Pro 1.5 Caption _ Gemini Pro 1.5 23 * * * * * * * * Human (Oracle) Caption _ Gemini Pro 1.5 50 * * * * * * * * Claude 3.5 Sonnet * 46 45 * * * 39 * * GPT4o * 55 83 * * * 50 * * Qwen-VL-Max * 35 53 * * * 26 * * Molmo-7B * 34 42 * * * 36 * *