--- base_model: xtuner/llava-llama-3-8b-v1_1-transformers library_name: gguf quantized_by: city96 tags: - image-text-to-text --- This is an imatrix gguf conversion of [xtuner/llava-llama-3-8b-v1_1-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers). Mainly intended to be used as the text encoder for Hunyuan Video, but possible to use for vision tasks with the [mmproj](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/blob/main/llava-llama-3-8b-v1_1-mmproj-f16.gguf) file from the xtuner gguf repository. The imatrix dataset used was [`calibration_datav3.txt`](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) by [Bartowski](https://huggingface.co/bartowski), which was used for all quants under Q6_K. Tested against wikitext / no imatrix and it outperformed both. Note that the `vocab_size` is different between the [transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) (128 320) and the [hf](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf) (128 256) repositories. This used the former as it was what was used in the official Hunyuan Video code. *IQ quants will be slow in ComfyUI due to using numpy fallback.*