File size: 13,224 Bytes
1a84b7f
 
6314c97
1a84b7f
 
 
 
 
 
61850b4
61536cb
61850b4
 
 
 
 
61199cd
 
 
 
be9d798
aa92c22
c7e1e04
61199cd
 
 
77b330e
 
3bdf225
77b330e
3bdf225
aa92c22
77b330e
 
aa92c22
80bad79
3bdf225
ad0f0cc
80bad79
56954b0
 
 
80bad79
 
56954b0
80bad79
 
 
 
56954b0
aa92c22
 
56954b0
80bad79
 
12af674
aa92c22
12af674
 
aa92c22
12af674
aa92c22
 
 
 
 
 
12af674
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa92c22
61199cd
29b7dc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a849bbb
83b7e94
bcf7afe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83b7e94
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: README
emoji: πŸ‡
colorFrom: pink
colorTo: indigo
sdk: static
pinned: false
---

# Visual Informatics Group @ University of Texas at Austin ([VITA-Group](https://vita-group.github.io/))

At VITA group, we have unusually broad, and forever-evolving research interests spanning from the theory to the 
application aspects of machine learning (ML). Our current "research keywords" include, but are not limited to: 
sparsity (from classical optimization to modern neural networks); efficient training, inference or transfer 
(especially, of large foundation models); robustness and trustworthiness; learning to optimize (L2O); 
generative AI; graph learning, and more. 


## Compressed LLM Model Zone

**NOTE: All compressed LLMs are moved to a new repo at [compressed-llm](https://huggingface.co/compressed-llm).**

The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/). Credits to Ajay Jaiswal, Zhenyu Zhang, Zhangheng Li, Lu Yin, Shiwei Liu and Junyuan Hong.

License: [MIT License](https://opensource.org/license/mit/)

Setup environment
```shell
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install transformers==4.31.0
pip install accelerate
pip install auto-gptq  # for gptq
```

How to use pruned models
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = 'llama-2-7b'
comp_method = 'magnitude_unstructured'
comp_degree = 0.2
model_path = f'vita-group/{base_model}_{comp_method}'
model = AutoModelForCausalLM.from_pretrained(
        model_path, 
        revision=f's{comp_degree}',
        torch_dtype=torch.float16, 
        low_cpu_mem_usage=True, 
        device_map="auto"
    )
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
outputs = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))
```

How to use wanda+gptq models
```python
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
tokenizer_path = 'meta-llama/Llama-2-7b-hf'
model = AutoGPTQForCausalLM.from_quantized(
        model_path,
        # inject_fused_attention=False, # or 
        disable_exllama=True,
        device_map='auto',
    )
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
outputs = model.generate(input_ids=input_ids, max_length=128)
tokenizer.decode(outputs[0])
```

How to use gptq models
```python
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
# model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
# tokenizer_path = 'meta-llama/Llama-2-7b-hf'
model_path = 'vita-group/vicuna-7b-v1.3_gptq'
tokenizer_path = 'lmsys/vicuna-7b-v1.3'
model = AutoGPTQForCausalLM.from_quantized(
        model_path,
        # inject_fused_attention=False, # or 
        disable_exllama=True,
        device_map='auto',
        revision='2bit_128g',
    )
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
outputs = model.generate(input_ids=input_ids, max_length=128)
tokenizer.decode(outputs[0])
```


|    | Base Model   | Model Size   | Compression Method                                                                              | Compression Degree                                                                                |
|---:|:-------------|:-------------|:------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|
|  0 | Llama-2      | 13b          | [magnitude_semistruct](https://huggingface.co/vita-group/llama-2-13b_magnitude_semistruct)      | [0.5_2to4](https://huggingface.co/vita-group/llama-2-13b_magnitude_semistruct/tree/0.5_2to4)      |
|  1 | Llama-2      | 13b          | [sparsegpt_semistruct](https://huggingface.co/vita-group/llama-2-13b_sparsegpt_semistruct)      | [0.5_2to4](https://huggingface.co/vita-group/llama-2-13b_sparsegpt_semistruct/tree/0.5_2to4)      |
|  2 | Llama-2      | 7b           | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured)   | [s0.1](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.1)             |
|  3 | Llama-2      | 7b           | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured)   | [s0.2](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.2)             |
|  4 | Llama-2      | 7b           | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured)   | [s0.3](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.3)             |
|  5 | Llama-2      | 7b           | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured)   | [s0.5](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.5)             |
|  6 | Llama-2      | 7b           | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured)   | [s0.6](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.6)             |
|  7 | Llama-2      | 7b           | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured)   | [s0.1](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.1)             |
|  8 | Llama-2      | 7b           | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured)   | [s0.2](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.2)             |
|  9 | Llama-2      | 7b           | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured)   | [s0.3](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.3)             |
| 10 | Llama-2      | 7b           | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured)   | [s0.5](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.5)             |
| 11 | Llama-2      | 7b           | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured)   | [s0.6](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.6)             |
| 12 | Llama-2      | 7b           | [wanda_gptq](https://huggingface.co/vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g)             | 4bit_128g                                                                                         |
| 13 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)           | [s0.1](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.1)                 |
| 14 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)           | [s0.2](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.2)                 |
| 15 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)           | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3)                 |
| 16 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)           | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5)                 |
| 17 | Llama-2      | 7b           | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured)           | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6)                 |
| 18 | Llama-2-chat | 13b          | [magnitude_semistruct](https://huggingface.co/vita-group/llama-2-13b-chat_magnitude_semistruct) | [0.5_2to4](https://huggingface.co/vita-group/llama-2-13b-chat_magnitude_semistruct/tree/0.5_2to4) |
| 19 | Llama-2-chat | 13b          | [sparsegpt_semistruct](https://huggingface.co/vita-group/llama-2-13b-chat_sparsegpt_semistruct) | [0.5_2to4](https://huggingface.co/vita-group/llama-2-13b-chat_sparsegpt_semistruct/tree/0.5_2to4) |
| 20 | vicuna       | 13b          | [magnitude_semistruct](https://huggingface.co/vita-group/vicuna-13b-v1.3_magnitude_semistruct)  | [0.5_2to4](https://huggingface.co/vita-group/vicuna-13b-v1.3_magnitude_semistruct/tree/0.5_2to4)  |
| 21 | vicuna       | 13b          | [sparsegpt_semistruct](https://huggingface.co/vita-group/vicuna-13b-v1.3_sparsegpt_semistruct)  | [0.5_2to4](https://huggingface.co/vita-group/vicuna-13b-v1.3_sparsegpt_semistruct/tree/0.5_2to4)  |
| 22 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [10bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/10bit_128g)              |
| 23 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [12bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/12bit_128g)              |
| 24 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [14bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/14bit_128g)              |
| 25 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [2bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/2bit_128g)                |
| 26 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [3bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/3bit_128g)                |
| 27 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [4bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/4bit_128g)                |
| 29 | vicuna-v1.3  | 13b          | [gptq](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq)                                  | [8bit_128g](https://huggingface.co/vita-group/vicuna-13b-v1.3_gptq/tree/8bit_128g)                |
| 30 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [10bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/10bit_128g)               |
| 31 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [12bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/12bit_128g)               |
| 32 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [14bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/14bit_128g)               |
| 33 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [2bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/2bit_128g)                 |
| 34 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [3bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/3bit_128g)                 |
| 35 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [4bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/4bit_128g)                 |
| 37 | vicuna-v1.3  | 7b           | [gptq](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq)                                   | [8bit_128g](https://huggingface.co/vita-group/vicuna-7b-v1.3_gptq/tree/8bit_128g)                 |


## Citations

If you are using models in this hub, please consider citing our papers.
```bibtex
@article{jaiswal2023emergence,
  title={The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter},
  author={Jaiswal, Ajay and Liu, Shiwei and Chen, Tianlong and Wang, Zhangyang},
  journal={arXiv},
  year={2023}
}
@article{jaiswal2023compressing,
      title={Compressing LLMs: The Truth is Rarely Pure and Never Simple}, 
      author={Ajay Jaiswal and Zhe Gan and Xianzhi Du and Bowen Zhang and Zhangyang Wang and Yinfei Yang},
      year={2023},
      journal={arXiv},
}
```


For any question, please contact [Junyuan Hong](mailto:[email protected]).