TIGER-Lab
/

VideoScore

@@ -41,32 +41,32 @@ with real videos excluded.
 The evaluation results are shown below:
-| metric            | Final Sum Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench     |
-|:-----------------:|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
-| MantisScore (reg) |       **278.3** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
-| MantisScore (gen) |           222.4 |       **77.1** |        27.6 |        59.0 |       58.7 |
-| Gemini-1.5-Pro    |    <u>158.8</u> |           22.1 |        22.9 |        60.9 |       52.9 |
-| Gemini-1.5-Flash  |           157.5 |           20.8 |        17.3 | <u>67.1</u> |       52.3 |
-| GPT-4o            |           155.4 |    <u>23.1</u> |        28.7 |        52.0 |       51.7 |
-| CLIP-sim          |           126.8 |            8.9 | <u>36.2</u> |        34.2 |       47.4 |
-| DINO-sim          |           121.3 |            7.5 |        32.1 |        38.5 |       43.3 |
-| SSIM-sim          |           118.0 |           13.4 |        26.9 |        34.1 |       43.5 |
-| CLIP-Score        |           114.4 |           -7.2 |        21.7 |        45.0 |       54.9 |
-| LLaVA-1.5-7B      |           108.3 |            8.5 |        10.5 |        49.9 |       39.4 |
-| LLaVA-1.6-7B      |            93.3 |           -3.1 |        13.2 |        44.5 |       38.7 |
-| X-CLIP-Score      |            92.9 |           -1.9 |        13.3 |        41.4 |       40.1 |
-| PIQE              |            78.3 |          -10.1 |        -1.2 |        34.5 |<u> 55.1</u>|
-| BRISQUE           |            75.9 |          -20.3 |         3.9 |        38.5 |       53.7 |
-| Idefics2          |            73.0 |            6.5 |         0.3 |        34.6 |       31.7 |
-| SSIM-dyn          |            42.5 |           -5.5 |       -17.0 |        28.4 |       36.5 |
-| MES-dyn           |            36.7 |          -12.9 |       -26.4 |        31.4 |       44.5 |
-| Fuyu              |               - |              - |           - |           - |          - |
 | Kosmos-2          |               - |              - |           - |           - |          - |
 | CogVLM            |               - |              - |           - |           - |          - |
-| OpenFlamingo      |               - |              - |           - |           - |          - |
 The best in MantisScore series is in bold and the best in baselines is underlined.
-"-" means the answer of MLLM is meaningless or in wrong format.
 ## Usage
 ### Installation

 The evaluation results are shown below:
+| metric            | Final Avg Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench     |
+|:-----------------:|:--------------:|:--------------:|:-----------:|:-----------:|:----------:|
+| MantisScore (reg) |       **69.6** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
+| MantisScore (gen) |           55.6 |       **77.1** |        27.6 |        59.0 |       58.7 |
+| Gemini-1.5-Pro    |    <u>39.7</u> |           22.1 |        22.9 |        60.9 |       52.9 |
+| Gemini-1.5-Flash  |           39.4 |           20.8 |        17.3 | <u>67.1</u> |       52.3 |
+| GPT-4o            |           38.9 |    <u>23.1</u> |        28.7 |        52.0 |       51.7 |
+| CLIP-sim          |           31.7 |            8.9 | <u>36.2</u> |        34.2 |       47.4 |
+| DINO-sim          |           30.3 |            7.5 |        32.1 |        38.5 |       43.3 |
+| SSIM-sim          |           29.5 |           13.4 |        26.9 |        34.1 |       43.5 |
+| CLIP-Score        |           28.6 |           -7.2 |        21.7 |        45.0 |       54.9 |
+| LLaVA-1.5-7B      |           27.1 |            8.5 |        10.5 |        49.9 |       39.4 |
+| LLaVA-1.6-7B      |           23.3 |           -3.1 |        13.2 |        44.5 |       38.7 |
+| X-CLIP-Score      |           23.2 |           -1.9 |        13.3 |        41.4 |       40.1 |
+| PIQE              |           19.6 |          -10.1 |        -1.2 |        34.5 |<u> 55.1</u>|
+| BRISQUE           |           19.0 |          -20.3 |         3.9 |        38.5 |       53.7 |
+| Idefics2          |           18.3 |            6.5 |         0.3 |        34.6 |       31.7 |
+| SSIM-dyn          |           10.6 |           -5.5 |       -17.0 |        28.4 |       36.5 |
+| MES-dyn           |            9.2 |          -12.9 |       -26.4 |        31.4 |       44.5 |
+<!-- | Fuyu              |               - |              - |           - |           - |          - |
 | Kosmos-2          |               - |              - |           - |           - |          - |
 | CogVLM            |               - |              - |           - |           - |          - |
+| OpenFlamingo      |               - |              - |           - |           - |          - | -->
 The best in MantisScore series is in bold and the best in baselines is underlined.
+<!-- "-" means the answer of MLLM is meaningless or in wrong format. -->
 ## Usage
 ### Installation