File size: 1,635 Bytes
22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 e4699ba 22efbe1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
language:
- en
license: mit
library_name: transformers
datasets:
- liuhaotian/LLaVA-Instruct-150K
- liuhaotian/LLaVA-Pretrain
---
# Model Card for LLaVa-Phi-2-3B
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/)
- **Model type:** LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture
- **Finetuned from model:** [Phi-2](https://huggingface.co/microsoft/phi-2)
- **License:** MIT
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA)
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Benchmarks
| Model | Parameters |SQA | GQA | TextVQA | POPE |
| --- | --- | --- | --- | --- | --- |
| [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7.3B | 68.0| **62.0** | **58.3** | 85.3 |
| [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | - | 49.6 | 38.59 | - |
| [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) | 3B | 68.4 | - | 48.6 | 85.0 |
| [moondream1](https://huggingface.co/vikhyatk/moondream1) | 1.6B | - | 56.3 | 39.8 | - |
| **llava-phi-2-3b** | 2.7B | 69.0| 51.2 | 47.0 | 86.0 |
| **llava-phi-2-3b-siglip** | 2.7B | **70.15%** | 52.56% | 47.99%| **87.00%** |
|