mtgv
/

MobileVLM_V2-3B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

MobileVLM_V2-3B / README.md

mtgv's picture

Update README.md

a119139 verified 12 months ago

|

1.08 kB

	---
	license: apache-2.0
	tags:
	- MobileVLM V2
	---
	## Model Summery
	MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale.

	The MobileVLM_V2-3B was built on our [MobileLLaMA-2.7B-Chat](https://huggingface.co/mtgv/MobileLLaMA-2.7B-Chat) to facilitate the off-the-shelf deployment.

	## Model Sources
	- Repository: https://github.com/Meituan-AutoML/MobileVLM
	- Paper: ［MobileVLM V2: Faster and Stronger Baseline for Vision Language Model](https://arxiv.org/abs/2402.03766)

	## How to Get Started with the Model
	Inference examples can be found at [Github](https://github.com/Meituan-AutoML/MobileVLM).