add text to about tab
Browse files- src/about.py +8 -2
src/about.py
CHANGED
@@ -20,11 +20,17 @@ INTRODUCTION_TEXT = """
|
|
20 |
"""
|
21 |
|
22 |
LLM_BENCHMARKS_TEXT = f"""
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
"""
|
25 |
|
26 |
EVALUATION_QUEUE_TEXT = """
|
27 |
-
|
28 |
"""
|
29 |
|
30 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
|
|
20 |
"""
|
21 |
|
22 |
LLM_BENCHMARKS_TEXT = f"""
|
23 |
+
This leaderboard presents hallucination benchmarks for multimodal LLMs on tasks of different input modalities, including image-captioning and video-captioning. For each task, we measure hallucination levels of the text output of various multimodal LLMs using existing hallucination metrics.
|
24 |
+
|
25 |
+
Some metrics such as POPE*, CHAIR, UniHD are designed specifically for image-to-text tasks, and thus are not directly applicable to video-to-text tasks. For the image-to-text benchmark, we also provide the ranking based human rating where annotators were asked to rate the outputs of the multimodal LLMs on MHaluBench. *Note that the POPE paper proposed both a dataset and a method.
|
26 |
+
|
27 |
+
More information about each existing metric can be found in their relevant paper, and CrossCheckGPT is proposed in https://arxiv.org/pdf/2405.13684.
|
28 |
+
|
29 |
+
Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at [email protected].
|
30 |
"""
|
31 |
|
32 |
EVALUATION_QUEUE_TEXT = """
|
33 |
+
Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at [email protected].
|
34 |
"""
|
35 |
|
36 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|