TIGER-Lab
/

VideoScore

Visual Question Answering

text-classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hexuan21 commited on Jun 22, 2024

Commit

daffd51

·

verified ·

1 Parent(s): 835f205

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ averaged among all the evaluation aspects as indicator.
 For GenAI-Bench and VBench, which include human preference data among two or more videos,
 we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
 | metric            | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench     |
-|-------------------|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
 | MantisScore (reg) |       **278.3** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
 | MantisScore (gen) |           222.4 |       **77.1** |        27.6 |        59.0 |       58.7 |
 | Gemini-1.5-Pro    |    <u>158.8</u> |           22.1 |        22.9 |        60.9 |       52.9 |
@@ -56,6 +56,7 @@ we employ the model's output to predict preferences and use pairwise accuracy as
 | Kosmos-2          |               - |              - |           - |           - |          - |
 | CogVLM            |               - |              - |           - |           - |          - |
 | OpenFlamingo      |               - |              - |           - |           - |          - |
 The best in MantisScore series is in bold and the best in baselines is underlined.
 "-" means the answer of MLLM is meaningless or in wrong format.

 For GenAI-Bench and VBench, which include human preference data among two or more videos,
 we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
 | metric            | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench     |
+|:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
 | MantisScore (reg) |       **278.3** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
 | MantisScore (gen) |           222.4 |       **77.1** |        27.6 |        59.0 |       58.7 |
 | Gemini-1.5-Pro    |    <u>158.8</u> |           22.1 |        22.9 |        60.9 |       52.9 |
 | Kosmos-2          |               - |              - |           - |           - |          - |
 | CogVLM            |               - |              - |           - |           - |          - |
 | OpenFlamingo      |               - |              - |           - |           - |          - |
 The best in MantisScore series is in bold and the best in baselines is underlined.
 "-" means the answer of MLLM is meaningless or in wrong format.