TIGER-Lab
/

VideoScore

Visual Question Answering

text-classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hexuan21 commited on Jun 22, 2024

Commit

e6a8d7b

·

verified ·

1 Parent(s): 1d7f958

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -32,7 +32,16 @@ For the first two benchmarks, we take Spearman corrleation between model's outpu
 averaged among all the evaluation aspects as indicator.
 For GenAI-Bench and VBench, which include human preference data among two or more videos,
 we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
-| metric            | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench     |
 |:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
 | MantisScore (reg) |       **278.3** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
 | MantisScore (gen) |           222.4 |       **77.1** |        27.6 |        59.0 |       58.7 |

 averaged among all the evaluation aspects as indicator.
 For GenAI-Bench and VBench, which include human preference data among two or more videos,
 we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
+Moreover, we use [MantisScore](https://huggingface.co/TIGER-Lab/MantisScore) trained on VideoFeedback dataset
+for VideoFeedback-test set, while for other three benchmarks, we use
+[MantisScore-anno-only](https://huggingface.co/TIGER-Lab/MantisScore-anno-only) variant trained on VideoFeedback dataset
+with real videos excluded.
+The evaluation results are shown below:
+| metric            | Final Sum Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench     |
 |:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
 | MantisScore (reg) |       **278.3** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
 | MantisScore (gen) |           222.4 |       **77.1** |        27.6 |        59.0 |       58.7 |