city96
/

llava-llama-3-8b-v1_1-imat-gguf

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

city96 commited on 23 days ago

Commit

6bc05ae

·

verified ·

1 Parent(s): 9b95ba3

Create README.md

Files changed (1) hide show

README.md +17 -0

README.md ADDED Viewed

	@@ -0,0 +1,17 @@

+---
+base_model: xtuner/llava-llama-3-8b-v1_1-transformers
+library_name: gguf
+quantized_by: city96
+tags:
+- image-text-to-text
+---
+This is an imatrix gguf conversion of [xtuner/llava-llama-3-8b-v1_1-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers).
+Mainly intended to be used as the text encoder for Hunyuan Video, but possible to use for vision tasks with the [mmproj](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/blob/main/llava-llama-3-8b-v1_1-mmproj-f16.gguf) file from the xtuner gguf repository.
+The imatrix dataset used was [`calibration_datav3.txt`](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) by [Bartowski](https://huggingface.co/bartowski), which was used for all quants under Q6_K. Tested against wikitext / no imatrix and it outperformed both.
+Note that the `vocab_size` is different between the [transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) (128 320) and the [hf](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf) (128 256) repositories. This used the former as it was what was used in the official Hunyuan Video code.
+*IQ quants will be slow in ComfyUI due to using numpy fallback.*