Transformers
GGUF
English
Inference Endpoints

Model Card for LLaVa-Phi-2-3B-GGUF

Model Details

Model Description

Quantized version of llava-phi-2-3b. Quantization was done using llama.cpp

  • Developed by: LAION, SkunkworksAI & Ontocord
  • Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture
  • Finetuned from model: Phi-2
  • License: MIT

Model Sources

Usage

make & ./llava-cli -m ../ggml-model-f16.gguf --mmproj ../mmproj-model-f16.gguf --image /path/to/image.jpg

Evaluation

Benchmarks

Model Parameters SQA GQA TextVQA POPE
LLaVA-1.5 7.3B 68.0 62.0 58.3 85.3
MC-LLaVA-3B 3B - 49.6 38.59 -
LLaVA-Phi 3B 68.4 - 48.6 85.0
moondream1 1.6B - 56.3 39.8 -
llava-phi-2-3b 3B 69.0 51.2 47.0 86.0

Image Captioning (MS COCO)

Model BLEU_1 BLEU_2 BLEU_3 BLEU_4 METEOR ROUGE_L CIDEr SPICE
llava-1.5-7b 75.8 59.8 45 33.3 29.4 57.7 108.8 23.5
llava-phi-2-3b 67.7 50.5 35.7 24.2 27.0 52.4 85.0 20.7
Downloads last month
196
GGUF
Model size
2.78B params
Architecture
phi2

16-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train marianna13/llava-phi-2-3b-GGUF