nintwentydo
/

Pixtral-Large-Instruct-2411-exl2-6.0bpw

@@ -24,18 +24,64 @@ pipeline_tag: image-text-to-text
 6.0bpw quant of [Pixtral-Large-Instruct](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411).
-Vision inputs working on dev branch of [ExLlamaV2](https://github.com/turboderp/exllamav2/tree/dev).
-## Tokenizer And Prompt Template
-Using conversion of v7m1 tokenizer with 32k vocab size.
-Chat template in chat_template.json uses the v7 instruct template:
 ```
-<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]
 ```
 ## Available Sizes
 | Repo | Bits | Head Bits | Size |
 | ----------- | ------ | ------ | ------ |

 6.0bpw quant of [Pixtral-Large-Instruct](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411).
+Vision inputs working on dev branch of [ExLlamaV2](https://github.com/turboderp/exllamav2/tree/dev).
+***21 Dec 2024:** This model has been a LOT of fun to experiment and learn with. Model card updated below with changes made to this repo
+over the last week.*
+## Architecture Differences to Pixtral 12B
+Pixtral 12B has bias keys for the multi_modal_projector layers, whereas Pixtral Large does not. Instead of including with low/zero values
+this conversion does not include those bias keys, aligning with the keys present in the original Pixtral Large upload from Mistral. The
+model's config.json file includes `"multimodal_projector_bias": false` to flag this. *n.b. If anyone in the community confirms initializing
+these keys with zero values is the better way to go I'm happy to reupload without them excluded.*
+## Tokenizer
+This model uses a conversion of the Mistral v7m1 tokenizer. Pixtral 12B and Large use different tokenizers with different vocab sizes,
+so make sure you use the right tokenizer.
+## Prompting / Chat Template
+The included chat_template.json supports all of Mistral's defined features with some of my own additions.
+I believe this implementation should give quite a lot of flexibility for using the model, and in my testing has worked quite well.
+Example *(line breaks added for readability)*
 ```
+<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT]
+[INST] [IMG]<user message>
+[AVAILABLE_TOOLS] [<tool definitions>][/AVAILABLE_TOOLS][/INST]
+[IMG]<assistant response>
+[TOOL_CALLS] [<tool calls>][/TOOL_CALLS]
+[TOOL_RESULTS] <tool results including images>[/TOOL_RESULTS]
+</s>[INST] <user message>[/INST]
 ```
+**System Prompts**:
+Messages with role "system" will be parsed as `[SYSTEM_PROMPT] <content>[/SYSTEM_PROMPT]` anywhere they appear in chat history.
+This appears to work pretty well for passing extra instructions at various depths, and keeps instructions separate from conversation.
+**Allowing Non-Alternating Roles**:
+Multiple user messages in a row can be provided, and each will be separated with `[INST][/INST]`. This could work well in group conversation
+settings, or environments where multiple user messages can be provided before the model is invoked. Having a `[/INST]` breaking each one up
+appeared to help prevent the model thinking it needs to respond to every previous message and focus on the last message, while still retaining
+knowledge of what messages sit before it.
+**Image Inputs Everywhere**:
+Images can now be sent in user, assistant, and tool result messages. And seems to actually work. I did tests like including an image on an
+assistant reply 10-15 messages back in the conversation, asked the assistant to recall what image they previously sent, and it was able to
+accurately describe it.
+Having this flexibility could allow for interesting applications, for example if you were to define a tool definition for image generation:
+- tool is invoked and calls image generation api/model
+- image returned inside tool result message
+- model responds with a message with context of the image generated
+- you can have further conversation about the generated image, or make revisions with the model actually knowing what was created
+## Usage
+Working in TabbyAPI with dev branch of ExLlamaV2.
+<img src="https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411/resolve/main/image-input-example.jpg">
 ## Available Sizes
 | Repo | Bits | Head Bits | Size |
 | ----------- | ------ | ------ | ------ |