Spaces:

MM-UPD
/

MM-UPD_Leaderboard

Running

AtsuMiyai commited on Jun 4, 2024

Commit

0ca2470

•

1 Parent(s): ffb6f1b

update explanations on MM-UPD Bench

Files changed (1) hide show

constants.py CHANGED Viewed

@@ -35,8 +35,7 @@ LEADERBORAD_INTRODUCTION = """
 <a href='https://arxiv.org/abs/2403.20331'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
 </div>
-## About MM-UPD Bench
-### What is MM-UPD Bench?
 MM-UPD Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Vision Language Models (VLMs) in the Context of Unsolvable Problem Detection (UPD)
 Our MM-UPD Bench encompasses three benchmarks: MM-AAD, MM-IASD, and MM-IVQD.
@@ -54,7 +53,7 @@ MM-IVQD Bench is a dataset where the question is incompatible with the image.
 MM-IVQD evaluates the VLMs' capability to discern when a question and image are irrelevant or inappropriate.
-### Characteristics of MM-UPD Bench
 We design MM-UPD Bench to provide a comprehensive evaluation of VLMs across multiple senarios.
 1\. **Multiple Senario Evaluation:** We carefully design prompts choices and examine the three senario: (i) Base (w/o instruction), (ii) Option (w/ additional option), (iii) Instruction (w/ additional instruction).

 <a href='https://arxiv.org/abs/2403.20331'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
 </div>
+## What is MM-UPD Bench?
 MM-UPD Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Vision Language Models (VLMs) in the Context of Unsolvable Problem Detection (UPD)
 Our MM-UPD Bench encompasses three benchmarks: MM-AAD, MM-IASD, and MM-IVQD.
 MM-IVQD evaluates the VLMs' capability to discern when a question and image are irrelevant or inappropriate.
+## Characteristics of MM-UPD Bench
 We design MM-UPD Bench to provide a comprehensive evaluation of VLMs across multiple senarios.
 1\. **Multiple Senario Evaluation:** We carefully design prompts choices and examine the three senario: (i) Base (w/o instruction), (ii) Option (w/ additional option), (iii) Instruction (w/ additional instruction).