Model	Open Ended VQA: % Human Rating	Multiple Choice VQA: % Accuracy	Hints-Multiple Choice VQA: % Accuracy 	Attributions-Multiple Choice VQA: % Accuracy 	Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Automatic Evaluation: % Auto-Rater Ratings	Hints-Automatic Evaluation: % Auto-Rater Ratings	Attributions-Automatic Evaluation: % Auto-Rater Ratings
Humans	82	*	*	*	*	*	78	*	*
Gemini Pro 1.5	40	38	66	72	87	52	53	62	29
Gemini Pro Vision	30	41	62	*	75	38	34	47	
GPT4	34	45	69	82	86	51	38	61	25
LlaVA-1.6-34B	15	24	30	*	76	43	21	16	*
LlaVA-1.5-7B	13	17	29	*	70	35	19	30	*
InstructBlip	13	*	*	*	*	*	20	28	*
Gemini Pro 1.5 Caption _ Gemini Pro 1.5	23	*	*	*	*	*	*	*	*
Human (Oracle) Caption _ Gemini Pro 1.5	50	*	*	*	*	*	*	*	*
Claude 3.5 Sonnet	*	46	45	*	*	*	39	*	*
GPT4o	*	55	83	*	*	*	50	*	*
Qwen-VL-Max	*	35	53	*	*	*	26	*	*
Molmo-7B	*	34	42	*	*	*	36	*	*