Cylingo
/

Xinyuan-VL-2B

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Xinyuan-VL-2B / README.md

thomas-yanxin's picture

Update README.md

2bb3cdc verified about 1 month ago

|

1.65 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: image-text-to-text
	tags:
	- multimodal
	library_name: transformers

	We evaluated [XinYuan-VL-2B](https://huggingface.co/thomas-yanxin/XinYuan-VL-2B) using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) toolkit across the following benchmarks and found that XinYuan-VL-2B outperformed [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) released by Alibaba Cloud, as well as other models of comparable parameter scale that have significant influence in the open-source community.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6299c90ef1f2a097fcaa1293/7ThTCYfd_lDzsvaFLlUv2.png)


	\| Benchamrk \| MiniCPM-2B \| InternVL-2B \| Qwen2-VL-2B \| XinYuan-VL-2B \|
	\| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| MMB-CN-V11-Test \| 64.5 \| 68.9 \| 71.2 \| 74.3 \|
	\| MMB-EN-V11-Test \| 65.8 \| 70.2 \| 73.2 \| 76.5 \|
	\| MMB-EN \| 69.1 \| 74.4 \| 74.3 \| 78.9 \|
	\| MMB-CN \| 66.5 \| 71.2 \| 73.8 \| 76.12 \|
	\| CCBench \| 45.3 \| 74.7 \| 53.7 \| 55.5 \|
	\| MMT-Bench \| 53.5 \| 50.8 \| 54.5 \| 55.2 \|
	\| RealWorld \| 55.8 \| 57.3 \| 62.9 \| 63.9 \|
	\| SEEDBench\_IMG \| 67.1 \| 70.9 \| 72.86 \| 73.4 \|
	\| AI2D \| 56.3 \| 74.1 \| 74.7 \| 74.2 \|
	\| MMMU \| 38.2 \| 36.3 \| 41.1 \| 40.9 \|
	\| HallusionBench \| 36.2 \| 36.2 \| 42.4 \| 55.00 \|
	\| POPE \| 86.3 \| 86.3 \| 86.82 \| 89.42 \|
	\| MME \| 1808.6 \| 1876.8 \| 1872.0 \| 1854.9 \|
	\| MMStar \| 39.1 \| 49.8 \| 47.5 \| 51.87 \|
	\| SEEDBench2\_Plus \| 51.9 \| 59.9 \| 62.23 \| 62.98 \|
	\| BLINK \| 41.2 \| 42.8 \| 43.92 \| 42.98 \|
	\| OCRBench \| 605 \| 781 \| 794 \| 782 \|
	\| TextVQA \| 74.1 \| 73.4 \| 79.7 \| 77.6 \|