BAAI
/

Aquila-VL-2B-llava-qwen

@@ -14,33 +14,45 @@ The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LL
 The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
-We plan to open-source the Infinity-MM dataset, training scripts, and related resources in the near future. For more technical details, stay tuned for our upcoming technical report.
 # Evaluation
-We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
-| Test sets       |   MiniCPM-V-2 |   InternVL2-2B |   XinYuan-VL-2B |   Qwen2-VL-2B-Instruct |   Aquila-VL-2B |
-|:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
-| MMMU\_DEV\_VAL    |         39.56 |          34.89 |           43.56 |                  41.67 |            **45.89** |
-| MMStar          |         41.6  |          50.2  |           51.87 |                  47.8  |            **54.4**  |
-| MMBench\_V11     |         65.2  |          69.72 |           **75.41** |                  72.7  |            72.63 |
-| MathVista\_MINI  |         39    |          45    |           47.1  |                  47.9  |            **57.8**  |
-| HallusionBench  |         36.83 |          38.06 |           36.03 |                  41.52 |            **42.64** |
-| OCRBench        |        613    |         784    |          782    |                **810**  |           776    |
-| AI2D\_TEST       |         64.8  |          74.38 |           74.22 |                  **74.64** |            74.38 |
-| MMVet           |         44.04 |          41.1  |           42.66 |                  **50.73** |            44.27 |
-| DocVQA\_TEST     |         71.02 |          86.87 |           87.63 |                  **89.87** |           84.62   |
-| ChartQA\_TEST    |         59.64 |          71.4  |           57.08 |                  73.52 |            **76.56** |
-| TextVQA\_VAL     |         74.3  |          73.49 |           77.61 |                  **79.9**  |            76.13 |
-| VCR\_EN\_EASY\_ALL |         27.61 |          51.59 |           67.71 |                  68.26 |           **73.33**    |
-| RealWorldQA     |         55.42 |          57.25 |           63.92 |                  62.61 |           **64.71**    |
-| MMBench\_TEST\_EN |         69.39 |          73.37 |           **78.87** |                  74.94 |           77.75    |
-| MMBench\_TEST\_CN |         65.86 |          70.85 |           **76.12** |                  73.93 |            72.25 |
-| MMT-Bench\_ALL   |         54.46 |          53.31 |           **57.24** |                  54.78 |           56.19    |
-| MathVision      |         15.43 |          12.6  |           16.32 |                  17.47 |            **18.52** |
-| OCRVQA\_TESTCORE |         54.43 |          40.23 |           67.64 |                  **68.68** |            63.83 |
-|Average| 52.22        |56.82        |61.07        |62.33|        **62.97** |
@@ -50,8 +62,9 @@ For comparison models, evaluations were conducted in a local environment, so the
 * We plan to train models of various sizes.
 * Future training will incorporate multi-image and video data.
-* We will open-source the Infinity-MM dataset and training code.
-* A comprehensive technical report will be released.
 # Disclaimer
-The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes. The content produced the model is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.

 The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
+We have open-sourced Infinity-MM dataset and related resources. We hope you enjoy using them!
+## News
+- `2024/10/25`:  The **Aquila-VL-2B** model and Infinity-MM dataset are now available.  We have also released the technical report simultaneously.
+<!-- We plan to open-source the Infinity-MM dataset, training scripts, and related resources in the near future. For more technical details, stay tuned for our upcoming technical report. -->
 # Evaluation
+We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the OpenAI API for test sets that support API-based evaluation.
+| Benchmark                    | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Aquila-VL-2B |
+| :--------------------------- | :---------: | :----------: | :-----------: | :------------------: | :----------: |
+| MMBench-EN<sub>test</sub>    |    69.4     |     73.4     |   **78.9**    |         74.9         |     78.8     |
+| MMBench-CN<sub>test</sub>    |    65.9     |     70.9     |     76.1      |         73.9         |   **76.4**   |
+| MMBench_V1.1<sub>test</sub>  |    65.2     |     69.7     |   **75.4**    |         72.7         |     75.2     |
+| MMT-Bench<sub>test</sub>     |    54.5     |     53.3     |     57.2      |         54.8         |   **58.2**   |
+| RealWorldQA                  |    55.4     |     57.3     |     63.9      |         62.6         |   **63.9**   |
+| HallusionBench               |    36.8     |     38.1     |     36.0      |         41.5         |   **43.0**   |
+| SEEDBench2<sub>plus</sub>    |    51.8     |     60.0     |     63.0      |         62.4         |   **63.0**   |
+| LLaVABench                   |    66.1     |     64.8     |     42.4      |         52.5         |   **68.4**   |
+| MMStar                       |    41.6     |     50.2     |     51.9      |         47.8         |   **54.9**   |
+| POPE                         |    86.6     |     85.3     |   **89.4**    |         88.0         |     83.6     |
+| MMVet                        |    44.0     |     41.1     |     42.7      |       **50.7**       |     44.3     |
+| MMMU<sub>val</sub>           |    39.6     |     34.9     |     43.6      |         41.7         |   **47.4**   |
+| ScienceQA<sub>test</sub>     |    80.4     |     94.1     |     86.6      |         78.1         |   **95.2**   |
+| AI2D<sub>test</sub>          |    64.8     |     74.4     |     74.2      |         74.6         |   **75.0**   |
+| MathVista<sub>testmini</sub> |    39.0     |     45.0     |     47.1      |         47.9         |   **59.0**   |
+| MathVerse<sub>testmini</sub> |    19.8     |     24.7     |     22.2      |         21.0         |   **26.2**   |
+| MathVision                   |    15.4     |     12.6     |     16.3      |         17.5         |   **18.4**   |
+| DocVQA<sub>test</sub>        |    71.0     |     86.9     |     87.6      |       **89.9**       |     85.0     |
+| InfoVQA<sub>test</sub>       |    40.0     |     59.5     |     59.1      |       **65.4**       |     58.3     |
+| ChartQA<sub>test</sub>       |    59.6     |     71.4     |     57.1      |         73.5         |   **76.5**   |
+| TextVQA<sub>val</sub>        |    74.3     |     73.5     |     77.6      |       **79.9**       |     76.4     |
+| OCRVQA<sub>testcore</sub>    |    54.4     |     40.2     |     67.6      |       **68.7**       |     64.0     |
+| VCR<sub>en easy</sub>        |    27.6     |     51.6     |     67.7      |         68.3         |   **70.0**   |
+| OCRBench                     |     613     |     784      |      782      |       **810**        |     772      |
+| Average                      |    53.5     |     58.8     |     60.9      |         62.1         |   **64.1**   |
 * We plan to train models of various sizes.
 * Future training will incorporate multi-image and video data.
+<!-- * A comprehensive technical report will be released. -->
+<!-- * We will open-source the Infinity-MM dataset and training code. -->
 # Disclaimer
+The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes. The content produced the model is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.[]()