Thireus
/

Vicuna13B-v1.1-8bit-128g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Thireus commited on Apr 16, 2023

Commit

01933fb

•

1 Parent(s): 95c4815

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -61,34 +61,34 @@ Best results in **bold**.
 - If this model produces answers with weird characters, it means you are not using the correct version of qwopqwop200/GPTQ-for-LLaMa as mentioned below.
 - If this model produces answers that are out of topic or if it talks to itself, it means you are not using the correct checkout 508de42 of qwopqwop200/GPTQ-for-LLaMa as mentioned below.
-Cuda (Slow tokens/s):
 ```
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
 pip install -r requirements.txt
 mkdir repositories
 cd repositories
-git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda # Make sure you obtain the qwopqwop200 version, not the oobabooga one! (because "act-order: yes")
 cd GPTQ-for-LLaMa
 pip install -r requirements.txt
-python setup_cuda.py install
 ```
-Triton (Fast tokens/s) - Works on Windows with WSL (what I've used) or Linux:
 ```
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
-git fetch origin pull/1229/head:triton # This is the version that supports Triton - https://github.com/oobabooga/text-generation-webui/pull/1229
-git checkout triton
 pip install -r requirements.txt
 mkdir repositories
 cd repositories
-git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git # -b cuda
 cd GPTQ-for-LLaMa
-git checkout 508de42 # Before qwopqwop200 broke everything... - https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/183
 pip install -r requirements.txt
 ```
 <br>

 - If this model produces answers with weird characters, it means you are not using the correct version of qwopqwop200/GPTQ-for-LLaMa as mentioned below.
 - If this model produces answers that are out of topic or if it talks to itself, it means you are not using the correct checkout 508de42 of qwopqwop200/GPTQ-for-LLaMa as mentioned below.
+RECOMMENDED - Triton (Fast tokens/s) - Works on Windows with WSL (what I've used) or Linux:
 ```
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
+git fetch origin pull/1229/head:triton # This is the version that supports Triton - https://github.com/oobabooga/text-generation-webui/pull/1229
+git checkout triton
 pip install -r requirements.txt
 mkdir repositories
 cd repositories
+git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git # -b cuda
 cd GPTQ-for-LLaMa
+git checkout 508de42 # Before qwopqwop200 broke everything... - https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/183
 pip install -r requirements.txt
 ```
+DISCOURAGED - Cuda (Slow tokens/s) and output issues https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/128:
 ```
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
 pip install -r requirements.txt
 mkdir repositories
 cd repositories
+git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda # Make sure you obtain the qwopqwop200 version, not the oobabooga one! (because "act-order: yes")
 cd GPTQ-for-LLaMa
 pip install -r requirements.txt
+python setup_cuda.py install
 ```
 <br>