Model Card for Model ID

This is a multimodal implementation of Phi2 model inspired by LlaVA-Phi.

Model Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  4. Finetuning Dataset: Instruct 150k dataset based on COCO
  5. Finetuned Model: GunaKoppula/Llava-Phi2

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

  1. Clone this repository and navigate to llava-phi folder
git clone https://github.com/zhuyiche/llava-phi.git
cd llava-phi
  1. Install Package
conda create -n llava_phi python=3.10 -y
conda activate llava_phi
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Run the Model
python llava_phi/eval/run_llava_phi.py --model-path="GunaKoppula/Llava-Phi2" \
    --image-file="https://huggingface.co/GunaKoppula/Llava-Phi2/resolve/main/people.jpg?download=true" \
    --query="How many people are there in the image?"

Acknowledgement

This implementation is based on wonderful work done by:
LlaVA-Phi
Llava
Phi2

Downloads last month
19
Safetensors
Model size
3.09B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train GunaKoppula/Llava-Phi2

Space using GunaKoppula/Llava-Phi2 1