llava-phi-2-GGUF / README.md

Disable inference and add authorship

7a0db38 verified 8 months ago

4.8 kB

	---
	base_model: marianna13/llava-phi-2-3b
	pipeline_tag: text-generation
	inference: false
	quantized_by: Kevin Cao
	language:
	- en
	license: mit
	library_name: transformers
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	---

	# GGUF Quantized LLaVa Phi-2 3B

	Original model from [marianna13/llava-phi-2-3b](https://huggingface.co/marianna13/llava-phi-2-3b).

	## Provided Files

	\| Name \| Quant method \| Bits \| Size \| Max RAM required \| Use case \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [ggml-model-Q2_K.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q2_K.gguf) \| Q2_K \| 2 \| 1.17 GB\| 3.67 GB \| smallest, significant quality loss - not recommended for most purposes \|
	\| [ggml-model-Q3_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_S.gguf) \| Q3_K_S \| 3 \| 1.25 GB\| 3.75 GB \| very small, high quality loss \|
	\| [ggml-model-Q3_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_M.gguf) \| Q3_K_M \| 3 \| 1.48 GB\| 3.98 GB \| very small, high quality loss \|
	\| [ggml-model-Q4_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_0.gguf) \| Q4_0 \| 4 \| 1.60 GB\| 4.10 GB \| legacy; small, very high quality loss - prefer using Q3_K_M \|
	\| [ggml-model-Q3_K_L.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_L.gguf) \| Q3_K_L \| 3 \| 1.60 GB\| 4.10 GB \| small, substantial quality loss \|
	\| [ggml-model-Q4_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_K_S.gguf) \| Q4_K_S \| 4 \| 1.62 GB\| 4.12 GB \| small, greater quality loss \|
	\| [ggml-model-Q4_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_K_M.gguf) \| Q4_K_M \| 4 \| 1.79 GB\| 4.29 GB \| medium, balanced quality - recommended \|
	\| [ggml-model-Q5_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_0.gguf) \| Q5_0 \| 5 \| 1.93 GB\| 4.43 GB \| legacy; medium, balanced quality - prefer using Q4_K_M \|
	\| [ggml-model-Q5_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_K_S.gguf) \| Q5_K_S \| 5 \| 1.93 GB\| 4.43 GB \| large, low quality loss - recommended \|
	\| [ggml-model-Q5_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_K_M.gguf) \| Q5_K_M \| 5 \| 2.07 GB\| 4.57 GB \| large, very low quality loss - recommended \|
	\| [ggml-model-Q6_K.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q6_K.gguf) \| Q6_K \| 6 \| 2.29 GB\| 4.79 GB \| very large, extremely low quality loss \|
	\| [ggml-model-Q8_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q8_0.gguf) \| Q8_0 \| 8 \| 2.96 GB\| 5.46 GB \| very large, extremely low quality loss - not recommended \|

	# ORIGINAL MODEL CARD

	# Model Card for LLaVa-Phi-2-3B

	<!-- Provide a quick summary of what the model is/does. -->

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->


	- Developed by: [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/)
	- Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
	It is an auto-regressive language model, based on the transformer architecture
	- Finetuned from model: [Phi-2](https://huggingface.co/microsoft/phi-2)
	- License: MIT
	- Demo: [llava-phi-2-3b-demo](https://huggingface.co/spaces/marianna13/llava-phi-2-3b-demo)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA)

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Benchmarks

	\| Model \| Parameters \|SQA \| GQA \| TextVQA \| POPE \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) \| 7.3B \| 68.0\| 62.0 \| 58.3 \| 85.3 \|
	\| [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) \| 3B \| - \| 49.6 \| 38.59 \| - \|
	\| [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) \| 3B \| 68.4 \| - \| 48.6 \| 85.0 \|
	\| [moondream1](https://huggingface.co/vikhyatk/moondream1) \| 1.6B \| - \| 56.3 \| 39.8 \| - \|
	\| llava-phi-2-3b \| 3B \| 69.0 \| 51.2 \| 47.0 \| 86.0 \|

	### Image Captioning (MS COCO)

	\| Model \| BLEU_1 \| BLEU_2 \| BLEU_3 \| BLEU_4 \| METEOR \| ROUGE_L \| CIDEr \| SPICE \|
	\| -------------------------------------------------------- \| ------ \| ------ \| ------ \| ------ \| ------ \| ------- \| ----- \| ----- \|
	\| llava-1.5-7b \| 75.8 \| 59.8 \| 45 \| 33.3 \| 29.4 \| 57.7 \| 108.8 \| 23.5 \|
	\| llava-phi-2-3b \| 67.7 \| 50.5 \| 35.7 \| 24.2 \| 27.0 \| 52.4 \| 85.0 \| 20.7 \|

	---
	base_model: marianna13/llava-phi-2-3b
	pipeline_tag: text-generation
	inference: false
	quantized_by: Kevin Cao
	language:
	- en
	license: mit
	library_name: transformers
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	---

	# GGUF Quantized LLaVa Phi-2 3B

	Original model from [marianna13/llava-phi-2-3b](https://huggingface.co/marianna13/llava-phi-2-3b).

	## Provided Files

	\| Name \| Quant method \| Bits \| Size \| Max RAM required \| Use case \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [ggml-model-Q2_K.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q2_K.gguf) \| Q2_K \| 2 \| 1.17 GB\| 3.67 GB \| smallest, significant quality loss - not recommended for most purposes \|
	\| [ggml-model-Q3_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_S.gguf) \| Q3_K_S \| 3 \| 1.25 GB\| 3.75 GB \| very small, high quality loss \|
	\| [ggml-model-Q3_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_M.gguf) \| Q3_K_M \| 3 \| 1.48 GB\| 3.98 GB \| very small, high quality loss \|
	\| [ggml-model-Q4_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_0.gguf) \| Q4_0 \| 4 \| 1.60 GB\| 4.10 GB \| legacy; small, very high quality loss - prefer using Q3_K_M \|
	\| [ggml-model-Q3_K_L.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q3_K_L.gguf) \| Q3_K_L \| 3 \| 1.60 GB\| 4.10 GB \| small, substantial quality loss \|
	\| [ggml-model-Q4_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_K_S.gguf) \| Q4_K_S \| 4 \| 1.62 GB\| 4.12 GB \| small, greater quality loss \|
	\| [ggml-model-Q4_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q4_K_M.gguf) \| Q4_K_M \| 4 \| 1.79 GB\| 4.29 GB \| medium, balanced quality - recommended \|
	\| [ggml-model-Q5_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_0.gguf) \| Q5_0 \| 5 \| 1.93 GB\| 4.43 GB \| legacy; medium, balanced quality - prefer using Q4_K_M \|
	\| [ggml-model-Q5_K_S.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_K_S.gguf) \| Q5_K_S \| 5 \| 1.93 GB\| 4.43 GB \| large, low quality loss - recommended \|
	\| [ggml-model-Q5_K_M.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q5_K_M.gguf) \| Q5_K_M \| 5 \| 2.07 GB\| 4.57 GB \| large, very low quality loss - recommended \|
	\| [ggml-model-Q6_K.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q6_K.gguf) \| Q6_K \| 6 \| 2.29 GB\| 4.79 GB \| very large, extremely low quality loss \|
	\| [ggml-model-Q8_0.gguf](https://huggingface.co/kejcao/llava-phi-2-GGUF/blob/main/ggml-model-Q8_0.gguf) \| Q8_0 \| 8 \| 2.96 GB\| 5.46 GB \| very large, extremely low quality loss - not recommended \|

	# ORIGINAL MODEL CARD

	# Model Card for LLaVa-Phi-2-3B

	<!-- Provide a quick summary of what the model is/does. -->

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->


	- Developed by: [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/)
	- Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
	It is an auto-regressive language model, based on the transformer architecture
	- Finetuned from model: [Phi-2](https://huggingface.co/microsoft/phi-2)
	- License: MIT
	- Demo: [llava-phi-2-3b-demo](https://huggingface.co/spaces/marianna13/llava-phi-2-3b-demo)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA)

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Benchmarks

	\| Model \| Parameters \|SQA \| GQA \| TextVQA \| POPE \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) \| 7.3B \| 68.0\| 62.0 \| 58.3 \| 85.3 \|
	\| [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) \| 3B \| - \| 49.6 \| 38.59 \| - \|
	\| [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) \| 3B \| 68.4 \| - \| 48.6 \| 85.0 \|
	\| [moondream1](https://huggingface.co/vikhyatk/moondream1) \| 1.6B \| - \| 56.3 \| 39.8 \| - \|
	\| llava-phi-2-3b \| 3B \| 69.0 \| 51.2 \| 47.0 \| 86.0 \|

	### Image Captioning (MS COCO)

	\| Model \| BLEU_1 \| BLEU_2 \| BLEU_3 \| BLEU_4 \| METEOR \| ROUGE_L \| CIDEr \| SPICE \|
	\| -------------------------------------------------------- \| ------ \| ------ \| ------ \| ------ \| ------ \| ------- \| ----- \| ----- \|
	\| llava-1.5-7b \| 75.8 \| 59.8 \| 45 \| 33.3 \| 29.4 \| 57.7 \| 108.8 \| 23.5 \|
	\| llava-phi-2-3b \| 67.7 \| 50.5 \| 35.7 \| 24.2 \| 27.0 \| 52.4 \| 85.0 \| 20.7 \|