TheBloke
/

Redmond-Hermes-Coder-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jul 1, 2023

Commit

dcba301

•

1 Parent(s): afc6fb6

Update README.md

Files changed (1) hide show

README.md +13 -3

README.md CHANGED Viewed

@@ -1,6 +1,14 @@
 ---
 inference: false
-license: other
 ---
 <!-- header start -->
@@ -42,7 +50,9 @@ Below is an instruction that describes a task. Write a response that appropriate
 ## How to easily download and use this model in text-generation-webui
-Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Redmond-Hermes-Coder-GPTQ`.
@@ -129,7 +139,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
 * `gptq_model-4bit-128g.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
-  * [ExLlama](https://github.com/turboderp/exllama) suupports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.

 ---
 inference: false
+license: gpl
+language:
+- en
+tags:
+- starcoder
+- wizardcoder
+- code
+- self-instruct
+- distillation
 ---
 <!-- header start -->
 ## How to easily download and use this model in text-generation-webui
+Please make sure you're using the latest version of text-generation-webui.
+Note: this is a non-Llama model which cannot be used with ExLlama. Use Loader: AutoGPTQ.
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Redmond-Hermes-Coder-GPTQ`.
 * `gptq_model-4bit-128g.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
+  * Does NOT work with [ExLlama](https://github.com/turboderp/exllama) as it's not a Llama model.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.