Model Summery

MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale.

The MobileVLM_V2-1.7B was built on our MobileLLaMA-1.4B-Chat to facilitate the off-the-shelf deployment.

Model Sources

How to Get Started with the Model

Inference examples can be found at Github.

Downloads last month
222
GGUF
Model size
297M params
Architecture
clip

4-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .