Model Card for Model ID
This is a multimodal implementation of Phi2 model inspired by LlaVA-Phi.
Model Details
- LLM Backbone: Phi2
- Vision Tower: clip-vit-large-patch14-336
- Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
- Finetuning Dataset: Instruct 150k dataset based on COCO
- Finetuned Model: marianna13/llava-phi-2-3b
Model Sources
- Original Repository: Llava-Phi
- Paper [optional]: LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
- Demo [optional]: Demo Link
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.