Spaces:
Running
Running
AtsuMiyai
commited on
Commit
•
0ca2470
1
Parent(s):
ffb6f1b
update explanations on MM-UPD Bench
Browse files- constants.py +2 -3
constants.py
CHANGED
@@ -35,8 +35,7 @@ LEADERBORAD_INTRODUCTION = """
|
|
35 |
<a href='https://arxiv.org/abs/2403.20331'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
|
36 |
</div>
|
37 |
|
38 |
-
##
|
39 |
-
### What is MM-UPD Bench?
|
40 |
MM-UPD Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Vision Language Models (VLMs) in the Context of Unsolvable Problem Detection (UPD)
|
41 |
|
42 |
Our MM-UPD Bench encompasses three benchmarks: MM-AAD, MM-IASD, and MM-IVQD.
|
@@ -54,7 +53,7 @@ MM-IVQD Bench is a dataset where the question is incompatible with the image.
|
|
54 |
MM-IVQD evaluates the VLMs' capability to discern when a question and image are irrelevant or inappropriate.
|
55 |
|
56 |
|
57 |
-
|
58 |
We design MM-UPD Bench to provide a comprehensive evaluation of VLMs across multiple senarios.
|
59 |
|
60 |
1\. **Multiple Senario Evaluation:** We carefully design prompts choices and examine the three senario: (i) Base (w/o instruction), (ii) Option (w/ additional option), (iii) Instruction (w/ additional instruction).
|
|
|
35 |
<a href='https://arxiv.org/abs/2403.20331'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
|
36 |
</div>
|
37 |
|
38 |
+
## What is MM-UPD Bench?
|
|
|
39 |
MM-UPD Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Vision Language Models (VLMs) in the Context of Unsolvable Problem Detection (UPD)
|
40 |
|
41 |
Our MM-UPD Bench encompasses three benchmarks: MM-AAD, MM-IASD, and MM-IVQD.
|
|
|
53 |
MM-IVQD evaluates the VLMs' capability to discern when a question and image are irrelevant or inappropriate.
|
54 |
|
55 |
|
56 |
+
## Characteristics of MM-UPD Bench
|
57 |
We design MM-UPD Bench to provide a comprehensive evaluation of VLMs across multiple senarios.
|
58 |
|
59 |
1\. **Multiple Senario Evaluation:** We carefully design prompts choices and examine the three senario: (i) Base (w/o instruction), (ii) Option (w/ additional option), (iii) Instruction (w/ additional instruction).
|