jzsues
/

llava-qwen1.5-4b-chat

Visual Question Answering

text-generation

Model card Files Files and versions Community

Model

llava-qwen1.5-4b-chat is a lightweight multimodal models base on LLaVA architecture.

Language Model: Qwen/Qwen1.5-4B-Chat
Vision Encoder: google/siglip-so400m-patch14-384
Total Paramters: 4,388,102,720

Evaluation

MMBench

Model	MMBench Test (EN)	MMBench Dev (EN)	MMBench Test (CN)	MMBench Dev (CN)	CCBench Dev
LLaVA-v1.5-7B	67.7	69.2	61.0	59.7	28.4
LLaVA-InternLM-7B	69.0	68.5	66.7	63.8	37.3
LLaVA-InternLM2-7B	73.3	74.6	71.7	72.0	42.5
Bunny-3B	69.2	68.6	-	-	-
MiniCPM-V	64.1	67.9	62.6	65.3	41.4
llava-qwen1.5-4b-chat	69.6	69.2	68.6	68.3	41.0

Uses

TBD

Training Details

TBD

Downloads last month: 5

Safetensors

Model size

4.39B params

Tensor type

BF16

·

Inference Providers NEW

Visual Question Answering

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train jzsues/llava-qwen1.5-4b-chat