city96
/

llava-llama-3-8b-v1_1-imat-gguf

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

llava-llama-3-8b-v1_1-imat-gguf / README.md

city96's picture

Create README.md

6bc05ae verified about 1 month ago

|

history blame contribute delete

1.21 kB

	---
	base_model: xtuner/llava-llama-3-8b-v1_1-transformers
	library_name: gguf
	quantized_by: city96
	tags:
	- image-text-to-text
	---

	This is an imatrix gguf conversion of [xtuner/llava-llama-3-8b-v1_1-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers).

	Mainly intended to be used as the text encoder for Hunyuan Video, but possible to use for vision tasks with the [mmproj](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/blob/main/llava-llama-3-8b-v1_1-mmproj-f16.gguf) file from the xtuner gguf repository.

	The imatrix dataset used was [`calibration_datav3.txt`](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) by [Bartowski](https://huggingface.co/bartowski), which was used for all quants under Q6_K. Tested against wikitext / no imatrix and it outperformed both.

	Note that the `vocab_size` is different between the [transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) (128 320) and the [hf](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf) (128 256) repositories. This used the former as it was what was used in the official Hunyuan Video code.

	IQ quants will be slow in ComfyUI due to using numpy fallback.