OPEA
/

Safetensors
qwen2
4-bit precision
intel/auto-round
sys-lpot-val commited on
Commit
7cac2d1
·
1 Parent(s): 2a54ca0

upload auto_gptq format

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ vocab.json filter=lfs diff=lfs merge=lfs -text
39
+ added_tokens.json filter=lfs diff=lfs merge=lfs -text
40
+ config.json filter=lfs diff=lfs merge=lfs -text
41
+ generation_config.json filter=lfs diff=lfs merge=lfs -text
42
+ quantize_config.json filter=lfs diff=lfs merge=lfs -text
43
+ quantization_config.json filter=lfs diff=lfs merge=lfs -text
44
+ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,186 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Model Details
2
+
3
+ This model is an int4 model with group_size 128 of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round)
4
+
5
+ ## How To Use
6
+
7
+ ### INT4 Inference(CPU/HPU/CUDA)
8
+
9
+ CPU requires auto-round version>0.3.1
10
+
11
+ ```python
12
+ from auto_round import AutoRoundConfig ##must import for auto-round format
13
+ from transformers import AutoModelForCausalLM,AutoTokenizer
14
+ quantized_model_dir = "OPEA/Qwen2.5-0.5B-Instruct-int4-inc"
15
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
16
+
17
+ model = AutoModelForCausalLM.from_pretrained(
18
+ quantized_model_dir,
19
+ torch_dtype='float16',
20
+ device_map="auto",
21
+ )
22
+
23
+ ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
24
+ ##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
25
+ ##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU
26
+
27
+ prompt = "There is a girl who likes adventure,"
28
+ messages = [
29
+ {"role": "system", "content": "You are a helpful assistant."},
30
+ {"role": "user", "content": prompt}
31
+ ]
32
+
33
+ text = tokenizer.apply_chat_template(
34
+ messages,
35
+ tokenize=False,
36
+ add_generation_prompt=True
37
+ )
38
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
39
+
40
+ generated_ids = model.generate(
41
+ model_inputs.input_ids,
42
+ max_new_tokens=200, ##change this to align with the official usage
43
+ do_sample=False ##change this to align with the official usage
44
+ )
45
+ generated_ids = [
46
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
47
+ ]
48
+
49
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
50
+ print(response)
51
+
52
+ prompt = "There is a girl who likes adventure,"
53
+ ## INT4:
54
+ """That's great to hear! What kind of adventure does the girl like? Is there anything specific she enjoys doing or exploring?"""
55
+
56
+ ## BF16:
57
+ """That's great to hear! What kind of adventure does the girl like? Is there anything specific she enjoys doing or exploring?"""
58
+
59
+
60
+ prompt = "9.11和9.8哪个数字大"
61
+ #INT4:
62
+ """
63
+ 要比较9.11和9.8的大小,我们可以按照以下步骤进行:
64
+
65
+ 1. 首先,将两个数都转换为相同的小数形式。这里我们使用小数点前的零来方便比较。
66
+
67
+ 9.11 = 9.1100 (保留两位小数)
68
+ 9.8 = 9.8000 (保留两位小数)
69
+
70
+ 2. 现在,比较这两个小数:
71
+ - 第一位:9 和 9 相等。
72
+ - 第二位:第一位是相同的,都是1。
73
+ - 第三位:第一个数是1,第二个数是8,所以8 > 1。
74
+
75
+ 因此,9.8大于9.11。
76
+
77
+ 最终答案:9.8更大。
78
+ """
79
+ ##BF16:
80
+ """
81
+ 要比较9.11和9.8的大小,我们可以按照以下步骤进行:
82
+
83
+ 1. 首先,将两个数都转换为相同的小数形式。这里我们使用小数点前的零来方便比较。
84
+
85
+ 9.11 = 9.1100 (保留两位小数)
86
+ 9.8 = 9.8000 (保留两位小数)
87
+
88
+ 2. 现在,比较这两个小数:
89
+ - 第一位:9 和 9 相等。
90
+ - 第二位:第一位是相同的,都是1。
91
+ - 第三位:第一个数是1,第二个数是8,所以8 > 1。
92
+
93
+ 因此,9.8大于9.11。
94
+
95
+ 最终答案:9.8更大。
96
+ """
97
+
98
+
99
+ prompt = "Once upon a time,"
100
+ ##INT4:
101
+ """I'm sorry, but I don't understand what you're asking me to do or what information you want me to provide. Could you please clarify your question or provide more context? I'd be happy to help if you can give me all the information you need."""
102
+
103
+ ##BF16:
104
+ """I'm sorry, but I don't understand what you're asking me to do or what information you want me to provide. Could you please clarify your question or provide more context? I'd be happy to help if you can give me all the information you need."""
105
+
106
+
107
+ prompt = "请简短介绍一下阿里巴巴公司"
108
+
109
+ ##INT4:
110
+ """阿里巴巴集团是全球领先的电子商务和云计算服务提供商,成立于1999年。该公司总部位于中国杭州,并在多个国家和地区设有办事处和运营中心。阿里巴巴集团的业务包括在线零售、移动支付、云计算、人工智能等。阿里巴巴集团是中国最大的电子商务平台之一,也是全球最大的电商平台之一。阿里巴巴集团还拥有众多子公司和品牌,如淘宝、天猫、菜鸟网络等。阿里巴巴集团在全球范围内拥有超过20亿活跃用户,每年销售额超过3500亿美元。阿里巴巴集团致力于通过创新和智能化技术推动商业变革,为消费者提供更便捷、更个性化的购物体验。"""
111
+
112
+ ##BF16:
113
+ """阿里巴巴集团是全球领先的电子商务和云计算服务提供商,成立于1999年。该公司总部位于中国杭州,并在多个国家和地区设有办事处和运营中心。阿里巴巴集团的业务包括在线零售、移动支付、云计算、人工智能等。阿里巴巴集团是中国最大的电子商务平台之一,也是全球最大的电商平台之一。阿里巴巴集团还拥有众多子公司和品牌,如淘宝、天猫、菜鸟网络等。阿里巴巴集团在全球范围内拥有超过20亿活跃用户,每年销售额超过3500亿美元。阿里巴巴集团致力于通过创新和智能化技术推动商业变革,为消费者提供更便捷、更个性化的购物体验。"""
114
+ ```
115
+
116
+ ### Evaluate the model
117
+
118
+ pip3 install lm-eval==0.4.5
119
+
120
+ ```bash
121
+ auto-round --model "OPEA/Qwen2.5-0.5B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
122
+ ```
123
+
124
+ | Metric | BF16 | INT4 |
125
+ | :----------------------------------------- | :----: | :----: |
126
+ | Avg | 0.4229 | 0.4124 |
127
+ | leaderboard_ifeval inst_level_strict_acc | 0.3501 | 0.3441 |
128
+ | leaderboard_ifeval prompt_level_strict_acc | 0.2107 | 0.2218 |
129
+ | leaderboard_mmlu_pro 5 shots | 0.1877 | 0.1678 |
130
+ | mmlu | 0.4582 | 0.4434 |
131
+ | cmmlu | 0.5033 | 0.4542 |
132
+ | ceval-valid | 0.5327 | 0.4918 |
133
+ | gsm8k 5 shots | 0.2146 | 0.2267 |
134
+ | lambada_openai | 0.4968 | 0.4692 |
135
+ | hellaswag | 0.4062 | 0.3927 |
136
+ | winogrande | 0.5541 | 0.5675 |
137
+ | piqa | 0.7051 | 0.7035 |
138
+ | truthfulqa_mc1 | 0.2693 | 0.2815 |
139
+ | openbookqa | 0.2400 | 0.2200 |
140
+ | boolq | 0.6783 | 0.6471 |
141
+ | arc_easy | 0.6566 | 0.6595 |
142
+ | arc_challenge | 0.3020 | 0.3072 |
143
+
144
+
145
+
146
+ ### Generate the model
147
+
148
+ Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
149
+
150
+ ```bash
151
+ auto-round \
152
+ --model Qwen/Qwen2.5-0.5B-Instruct \
153
+ --device 0 \
154
+ --group_size 128 \
155
+ --nsamples 512 \
156
+ --bits 4 \
157
+ --iter 1000 \
158
+ --disable_eval \
159
+ --model_dtype "fp16" \
160
+ --format 'auto_gptq,auto_round' \
161
+ --output_dir "./tmp_autoround"
162
+ ```
163
+
164
+ ## Ethical Considerations and Limitations
165
+
166
+ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
167
+
168
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
169
+
170
+ ## Caveats and Recommendations
171
+
172
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
173
+
174
+ Here are a couple of useful links to learn more about Intel's AI software:
175
+
176
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
177
+
178
+ ## Disclaimer
179
+
180
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
181
+
182
+ ## Cite
183
+
184
+ @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
185
+
186
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58b54bbe36fc752f79a24a271ef66a0a0830054b4dfad94bde757d851968060b
3
+ size 605
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96f5d8e1d262852583bf8492ba2c4b8d101db7d0d60f8d3e6c7a42f9b36aa4dc
3
+ size 1367
generation_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dc30d5b7f022dcbfaaef3e55340642208a3b0436214346caf1c522c009f699d
3
+ size 242
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e0209a213dc03574cf5f5052e2e4c8726a196bee189b27aea69fb5bcc04cb26
3
+ size 459946568
quantization_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d603e56eb4bb154a31dc23a83f243fc179aeb8631b8a0639837c3d06b06e8d8b
3
+ size 569
quantize_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40a1be12405e831fd26943690d41ea6d85bc4d452305943a8389fc54bba336c9
3
+ size 559
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76862e765266b85aa9459767e33cbaf13970f327a0e88d1c65846c2ddd3a1ecd
3
+ size 613
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e88129d9769a0b14b1587a7d5e829fe93ac0e1511636471fdfc0811951418e6
3
+ size 7306
vocab.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca10d7e9fb3ed18575dd1e277a2579c16d108e32f27439684afa0e10b1440910
3
+ size 2776833