Spaces:

vita-group
/

README

Running

App Files Files Community

jyhong836 commited on Sep 2, 2023

Commit

aa92c22

1 Parent(s): 3bdf225

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -18,6 +18,7 @@ generative AI; graph learning, and more.
 ## Compressed LLM Model Zone
 The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).
 License: [MIT License](https://opensource.org/license/mit/)
@@ -27,9 +28,10 @@ Setup environment
 pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
 pip install transformers==4.31.0
 pip install accelerate
 ```
-How to use
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -45,11 +47,22 @@ model = AutoModelForCausalLM.from_pretrained(
         device_map="auto"
     )
 tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
-input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids
-outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 |    | Base Model   | Model Size   | Compression Method                                                                            | Compression Degree                                                                    |
 |---:|:-------------|:-------------|:----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
@@ -68,3 +81,4 @@ print(tokenizer.decode(outputs[0]))
 | 12 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3)     |
 | 13 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5)     |
 | 14 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6)     |

 ## Compressed LLM Model Zone
 The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).
 License: [MIT License](https://opensource.org/license/mit/)
 pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
 pip install transformers==4.31.0
 pip install accelerate
+pip install auto-gptq  # for gptq
 ```
+How to use pruned models
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
         device_map="auto"
     )
 tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
+input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
+outputs = model.generate(input_ids, max_new_tokens=128)
 print(tokenizer.decode(outputs[0]))
 ```
+How to use quantized models
+```python
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
+model = AutoGPTQForCausalLM.from_quantized(
+        model_path,
+        # inject_fused_attention=False, # or
+        disable_exllama=True,
+        device_map='auto',
+    )
+```
 |    | Base Model   | Model Size   | Compression Method                                                                            | Compression Degree                                                                    |
 |---:|:-------------|:-------------|:----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
 | 12 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3)     |
 | 13 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5)     |
 | 14 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)         | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6)     |
+| 15 | Llama-2      | 7b           | [wanda_gptq](https://huggingface.co/vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g)  | 4bit |