jyhong836 commited on
Commit
aa92c22
·
1 Parent(s): 3bdf225

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -18,6 +18,7 @@ generative AI; graph learning, and more.
18
 
19
  ## Compressed LLM Model Zone
20
 
 
21
  The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).
22
 
23
  License: [MIT License](https://opensource.org/license/mit/)
@@ -27,9 +28,10 @@ Setup environment
27
  pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
28
  pip install transformers==4.31.0
29
  pip install accelerate
 
30
  ```
31
 
32
- How to use
33
  ```python
34
  import torch
35
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -45,11 +47,22 @@ model = AutoModelForCausalLM.from_pretrained(
45
  device_map="auto"
46
  )
47
  tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
48
- input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids
49
- outputs = model.generate(input_ids)
50
  print(tokenizer.decode(outputs[0]))
51
  ```
52
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  | | Base Model | Model Size | Compression Method | Compression Degree |
55
  |---:|:-------------|:-------------|:----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
@@ -68,3 +81,4 @@ print(tokenizer.decode(outputs[0]))
68
  | 12 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3) |
69
  | 13 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5) |
70
  | 14 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6) |
 
 
18
 
19
  ## Compressed LLM Model Zone
20
 
21
+
22
  The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).
23
 
24
  License: [MIT License](https://opensource.org/license/mit/)
 
28
  pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
29
  pip install transformers==4.31.0
30
  pip install accelerate
31
+ pip install auto-gptq # for gptq
32
  ```
33
 
34
+ How to use pruned models
35
  ```python
36
  import torch
37
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
47
  device_map="auto"
48
  )
49
  tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
50
+ input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
51
+ outputs = model.generate(input_ids, max_new_tokens=128)
52
  print(tokenizer.decode(outputs[0]))
53
  ```
54
 
55
+ How to use quantized models
56
+ ```python
57
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
58
+ model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
59
+ model = AutoGPTQForCausalLM.from_quantized(
60
+ model_path,
61
+ # inject_fused_attention=False, # or
62
+ disable_exllama=True,
63
+ device_map='auto',
64
+ )
65
+ ```
66
 
67
  | | Base Model | Model Size | Compression Method | Compression Degree |
68
  |---:|:-------------|:-------------|:----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
 
81
  | 12 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3) |
82
  | 13 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5) |
83
  | 14 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6) |
84
+ | 15 | Llama-2 | 7b | [wanda_gptq](https://huggingface.co/vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g) | 4bit |