hexuan21 commited on
Commit
e6a8d7b
·
verified ·
1 Parent(s): 1d7f958

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -32,7 +32,16 @@ For the first two benchmarks, we take Spearman corrleation between model's outpu
32
  averaged among all the evaluation aspects as indicator.
33
  For GenAI-Bench and VBench, which include human preference data among two or more videos,
34
  we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
35
- | metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench |
 
 
 
 
 
 
 
 
 
36
  |:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
37
  | MantisScore (reg) | **278.3** | 75.7 | **51.1** | **78.5** | **73.0** |
38
  | MantisScore (gen) | 222.4 | **77.1** | 27.6 | 59.0 | 58.7 |
 
32
  averaged among all the evaluation aspects as indicator.
33
  For GenAI-Bench and VBench, which include human preference data among two or more videos,
34
  we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
35
+
36
+ Moreover, we use [MantisScore](https://huggingface.co/TIGER-Lab/MantisScore) trained on VideoFeedback dataset
37
+ for VideoFeedback-test set, while for other three benchmarks, we use
38
+ [MantisScore-anno-only](https://huggingface.co/TIGER-Lab/MantisScore-anno-only) variant trained on VideoFeedback dataset
39
+ with real videos excluded.
40
+
41
+ The evaluation results are shown below:
42
+
43
+
44
+ | metric | Final Sum Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench |
45
  |:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
46
  | MantisScore (reg) | **278.3** | 75.7 | **51.1** | **78.5** | **73.0** |
47
  | MantisScore (gen) | 222.4 | **77.1** | 27.6 | 59.0 | 58.7 |