Update README.md
Browse files
README.md
CHANGED
@@ -41,32 +41,32 @@ with real videos excluded.
|
|
41 |
The evaluation results are shown below:
|
42 |
|
43 |
|
44 |
-
| metric | Final
|
45 |
-
|
46 |
-
| MantisScore (reg) | **
|
47 |
-
| MantisScore (gen) |
|
48 |
-
| Gemini-1.5-Pro | <u>
|
49 |
-
| Gemini-1.5-Flash |
|
50 |
-
| GPT-4o |
|
51 |
-
| CLIP-sim |
|
52 |
-
| DINO-sim |
|
53 |
-
| SSIM-sim |
|
54 |
-
| CLIP-Score |
|
55 |
-
| LLaVA-1.5-7B |
|
56 |
-
| LLaVA-1.6-7B |
|
57 |
-
| X-CLIP-Score |
|
58 |
-
| PIQE |
|
59 |
-
| BRISQUE |
|
60 |
-
| Idefics2 |
|
61 |
-
| SSIM-dyn |
|
62 |
-
| MES-dyn |
|
63 |
-
| Fuyu | - | - | - | - | - |
|
64 |
| Kosmos-2 | - | - | - | - | - |
|
65 |
| CogVLM | - | - | - | - | - |
|
66 |
-
| OpenFlamingo | - | - | - | - | - |
|
67 |
|
68 |
The best in MantisScore series is in bold and the best in baselines is underlined.
|
69 |
-
"-" means the answer of MLLM is meaningless or in wrong format.
|
70 |
|
71 |
## Usage
|
72 |
### Installation
|
|
|
41 |
The evaluation results are shown below:
|
42 |
|
43 |
|
44 |
+
| metric | Final Avg Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench |
|
45 |
+
|:-----------------:|:--------------:|:--------------:|:-----------:|:-----------:|:----------:|
|
46 |
+
| MantisScore (reg) | **69.6** | 75.7 | **51.1** | **78.5** | **73.0** |
|
47 |
+
| MantisScore (gen) | 55.6 | **77.1** | 27.6 | 59.0 | 58.7 |
|
48 |
+
| Gemini-1.5-Pro | <u>39.7</u> | 22.1 | 22.9 | 60.9 | 52.9 |
|
49 |
+
| Gemini-1.5-Flash | 39.4 | 20.8 | 17.3 | <u>67.1</u> | 52.3 |
|
50 |
+
| GPT-4o | 38.9 | <u>23.1</u> | 28.7 | 52.0 | 51.7 |
|
51 |
+
| CLIP-sim | 31.7 | 8.9 | <u>36.2</u> | 34.2 | 47.4 |
|
52 |
+
| DINO-sim | 30.3 | 7.5 | 32.1 | 38.5 | 43.3 |
|
53 |
+
| SSIM-sim | 29.5 | 13.4 | 26.9 | 34.1 | 43.5 |
|
54 |
+
| CLIP-Score | 28.6 | -7.2 | 21.7 | 45.0 | 54.9 |
|
55 |
+
| LLaVA-1.5-7B | 27.1 | 8.5 | 10.5 | 49.9 | 39.4 |
|
56 |
+
| LLaVA-1.6-7B | 23.3 | -3.1 | 13.2 | 44.5 | 38.7 |
|
57 |
+
| X-CLIP-Score | 23.2 | -1.9 | 13.3 | 41.4 | 40.1 |
|
58 |
+
| PIQE | 19.6 | -10.1 | -1.2 | 34.5 |<u> 55.1</u>|
|
59 |
+
| BRISQUE | 19.0 | -20.3 | 3.9 | 38.5 | 53.7 |
|
60 |
+
| Idefics2 | 18.3 | 6.5 | 0.3 | 34.6 | 31.7 |
|
61 |
+
| SSIM-dyn | 10.6 | -5.5 | -17.0 | 28.4 | 36.5 |
|
62 |
+
| MES-dyn | 9.2 | -12.9 | -26.4 | 31.4 | 44.5 |
|
63 |
+
<!-- | Fuyu | - | - | - | - | - |
|
64 |
| Kosmos-2 | - | - | - | - | - |
|
65 |
| CogVLM | - | - | - | - | - |
|
66 |
+
| OpenFlamingo | - | - | - | - | - | -->
|
67 |
|
68 |
The best in MantisScore series is in bold and the best in baselines is underlined.
|
69 |
+
<!-- "-" means the answer of MLLM is meaningless or in wrong format. -->
|
70 |
|
71 |
## Usage
|
72 |
### Installation
|