tsumeone
/

stable-vicuna-13B-4bit-128g-cuda

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tsumeone commited on Apr 30, 2023

Commit

c0fe9b2

•

1 Parent(s): 9fd74ab

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -5,4 +5,13 @@ Big thank you to TheBloke for uploading the HF version above. Unfortunately, hi
 GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI.
 Command used to quantize:
-```python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```

 GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI.
 Command used to quantize:
+```python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```
+This model works best with the following prompting. Also, it really does not like to stop on its own and will likely keep going on forever if you let it.
+```### Human:
+What is 2+2?
+### Assistant:
+```