upload auto_round format

Files changed (14) hide show

.gitattributes +2 -0
README.md +105 -56
config.json +2 -2
model-00001-of-00009.safetensors +3 -0
model-00002-of-00009.safetensors +3 -0
model-00003-of-00009.safetensors +3 -0
model-00004-of-00009.safetensors +3 -0
model-00005-of-00009.safetensors +3 -0
model-00006-of-00009.safetensors +3 -0
model-00007-of-00009.safetensors +3 -0
model-00008-of-00009.safetensors +3 -0
model-00009-of-00009.safetensors +3 -0
model.safetensors.index.json +3 -0
quantization_config.json +3 -0

.gitattributes CHANGED Viewed

@@ -41,3 +41,5 @@ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 vocab.json filter=lfs diff=lfs merge=lfs -text

 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 vocab.json filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+quantization_config.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -3,22 +3,17 @@ license: apache-2.0
 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
-This model is an int4 model with group_size 128 of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
 ## How To Use
-### INT4 Inference
 ```python
-##git clone https://github.com/intel/auto-round.git
-##cd auto-round && pip install -vvv --no-build-isolation -e .
-from auto_round import AutoHfQuantizer ##must import
-import torch
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +22,7 @@ model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +44,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     model_inputs.input_ids,
-    max_new_tokens=50,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
@@ -58,75 +54,131 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
-##prompt = "There is a girl who likes adventure,
-##That sounds like a wonderful trait! A girl who enjoys adventure likely has a spirit of curiosity, bravery, and a willingness to explore the unknown. Whether it's trying new activities, traveling to new places, or simply seeking out new experiences, her love
-##prompt = "Which one is bigger, 9.11 or 9.8"
-##To determine which number is bigger between 9.11 and 9.8, you can compare them digit by digit:
-##1. Compare the whole number parts:
-##   - Both numbers have the same whole number part, which is 9.
-##prompt = "Once upon a time,"
-##Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle
-##prompt = "请介绍一下阿里巴巴公司"
-##阿里巴巴集团是一家中国跨国科技公司，成立于1999年，总部位于中国杭州。阿里巴巴的业务涵盖了电子商务、零售、金融、物流、云计算等多个领域，旗下拥有包括淘宝网、天猫、菜鸟网络、阿里云等
 ```
-### Evaluate the model
-pip3 install lm-eval==0.4.4
 ```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round --model "OPEA/Qwen2.5-72B-Instruct-int4-inc" --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
 ```
-| Metric         | BF16   | INT4   |
-|:-------------- | :----: | :----: |
-| Avg            | 0.7582 | 0.7567 |
-| mmlu           | 0.8336 | 0.8306 |
-| cmmlu          | 0.8722 | 0.8638 |
-| ceval-valid    | 0.8982 | 0.8938 |
-| lambada_openai | 0.7518 | 0.7603 |
-| hellaswag      | 0.7040 | 0.6970 |
-| winogrande     | 0.7577 | 0.7695 |
-| piqa           | 0.8335 | 0.8270 |
-| truthfulqa_mc1 | 0.5288 | 0.5202 |
-| openbookqa     | 0.3860 | 0.3900 |
-| boolq          | 0.9046 | 0.9080 |
-| arc_easy       | 0.8603 | 0.8577 |
-| arc_challenge  | 0.6169 | 0.6109 |
-| gsm8k 5 shots  | 0.9090 | 0.9083
-### Reproduce the model
-Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
 ```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round \
---model_name  Qwen/Qwen2.5-72B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
---model_dtype "float16" \
---format 'auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -139,15 +191,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Here are a couple of useful links to learn more about Intel's AI software:
-* Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
-* Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
+This model is an int4 model with group_size 128 and and symmetric quantization of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="b162b49"` to use AutoGPTQ format.
 ## How To Use
+### INT4 Inference(CPU/HPU/CUDA)
+CPU requires auto-round version>0.3.1
 ```python
+from auto_round import AutoRoundConfig ##must import for auto-round format
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
+    ##revision="b162b49" ##AutoGPTQ format
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 generated_ids = model.generate(
     model_inputs.input_ids,
+    max_new_tokens=200,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
+prompt = "There is a girl who likes adventure,"
+##INT4:
+"""That's great to hear! A love for adventure can lead to so many exciting experiences and personal growth. What kind of adventures does she enjoy? Is it exploring new places, trying out thrilling activities, or perhaps diving into new hobbies and challenges? Knowing more about her interests could help in suggesting fun ideas or planning future adventures.
+"""
+##BF16:
+"""That sounds like a great quality to have! A love for adventure can lead to exciting experiences and personal growth. What kind of adventures does she enjoy? Does she like exploring new places, trying new activities, or seeking out thrilling experiences? Knowing more about her interests can help suggest specific ideas or activities that might appeal to her adventurous spirit.
+"""
+prompt = "9.11和9.8哪个数字大"
+#INT4:
+"""要比较9.11和9.8的大小，可以按照以下步骤进行：
+1. **比较整数部分**：两个数字的整数部分都是9，所以需要进一步比较小数部分。
+2. **比较小数部分**：
+   - 9.11的小数部分是0.11
+   - 9.8的小数部分是0.8
+3. **比较0.11和0.8**：
+   - 0.11可以写成0.110
+   - 0.8可以写成0.800
+4. **逐位比较**：
+   - 第一位：1 < 8，所以0.110 < 0.800
+因此，9.11 < 9.8。
+结论：9.8比9.11大。"""
+##BF16:
+"""比较两个数字 9.11 和 9.8，可以按照以下步骤进行：
+1. **整数部分**：两个数字的整数部分都是 9，所以需要比较小数部分。
+2. **小数部分**：
+   - 9.11 的小数部分是 0.11
+   - 9.8 的小数部分是 0.8
+3. **比较小数部分**：
+   - 0.11 和 0.8 比较时，0.8 明显大于 0.11。
+因此，9.8 大于 9.11。"""
+prompt = "Once upon a time,"
+##INT4:
+"""Once upon a time, in a far-off land, there was a kingdom filled with wonder and magic. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush forests and rolling hills.
+The people of the kingdom lived happily, tending to their farms, crafting beautiful goods, and enjoying the simple pleasures of life. However, one day, a great darkness began to spread across the land. A wicked sorcerer had risen from the shadows, seeking to claim the throne for himself and plunge the kingdom into chaos.
+The king, knowing that he could not face this threat alone, called upon the bravest and most skilled heroes from all corners of the realm. Among them was a young knight named Sir Cedric, who had earned a reputation for his courage and unwavering sense of justice.
+Sir Cedric, along with a group of loyal companions, set out on a perilous journey to stop the sor"""
+##BF16:
+"""Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush gardens and sparkling fountains.
+The king had a young daughter named Princess Elara, who was as kind and gentle as she was beautiful. She spent her days helping the poor and spreading joy throughout the kingdom. The people adored her, and she was beloved by all.
+One day, a great challenge arose. A dark forest on the outskirts of the kingdom began to grow wild and dangerous, threatening the safety of the villagers. The king called for a hero to tame the forest and protect his people. Many brave knights and warriors came forward, but none could succeed.
+Princess Elara, determined to help, decided to venture into the forest herself. Her father was hesitant, but he saw the determination in her eyes and knew"""
+prompt = "请简短介绍一下阿里巴巴公司"
+##INT4:
+"""阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。它成立于1999年，由马云和他的团队创立。阿里巴巴旗下拥有包括淘宝、天猫、阿里云等在内的多个知名业务平台，涵盖了在线零售、批发贸易、云计算、数字娱乐、金融服务等多个领域。
+阿里巴巴的愿景是让世界各地的企业都能够平等地进行贸易，通过技术创新推动数字经济的发展，为社会创造更多的价值。目前，阿里巴巴已经发展成为世界领先的互联网公司之一，业务遍布全球多个国家和地区，服务着数以亿计的用户和商家。"""
+##BF16:
+"""阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。成立于1999年，阿里巴巴最初是一个B2B在线市场，旨在连接中国制造商与全球买家。经过二十多年的发展，阿里巴巴已经发展成为涵盖电子商务、金融、物流、云计算等多个领域的综合性企业集团。
+阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名品牌，为消费者提供购物、支付、娱乐等多元化服务，同时也为企业提供营销、销售、物流和技术支持等全方位解决方案。此外，阿里巴巴还积极投资和孵化创新项目，推动数字经济的发展。
+阿里巴巴始终秉持“让天下没有难做的生意”的使命，致力于通过技术创新促进全球经济的可持续发展。"""
 ```
+### Evaluate the model
+pip3 install lm-eval==0.4.5
 ```bash
+auto-round --model "OPEA/Qwen2.5-72B-Instruct-int4-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
 ```
+| Metric                                     |  BF16  |  INT4  |
+| :----------------------------------------- | :----: | :----: |
+| Avg                                        | 0.7413 | 0.7448 |
+| leaderboard_mmlu_pro 5 shots               | 0.5919 | 0.5864 |
+| leaderboard_ifeval inst_level_strict_acc   | 0.7770 | 0.7866 |
+| leaderboard_ifeval prompt_level_strict_acc | 0.6858 | 0.6932 |
+| mmlu                                       | 0.8334 | 0.8308 |
+| cmmlu                                      | 0.8727 | 0.8673 |
+| ceval-valid                                | 0.8975 | 0.8960 |
+| gsm8k 5 shots                              | 0.9037 | 0.9098 |
+| lambada_openai                             | 0.7518 | 0.7563 |
+| hellaswag                                  | 0.7031 | 0.7014 |
+| winogrande                                 | 0.7601 | 0.7687 |
+| piqa                                       | 0.8313 | 0.8232 |
+| truthfulqa_mc1                             | 0.5239 | 0.5263 |
+| openbookqa                                 | 0.3860 | 0.3820 |
+| boolq                                      | 0.9049 | 0.9046 |
+| arc_easy                                   | 0.8632 | 0.8611 |
+| arc_challenge                              | 0.6135 | 0.6237 |
+### Generate the model
+Here is the sample command to generate the model.
 ```bash
+auto-round \
+--model  Qwen/Qwen2.5-72B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
+--model_dtype "fp16" \
+--format 'auto_gptq,auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 Here are a couple of useful links to learn more about Intel's AI software:
+- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

config.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5cf5ce83e29709fbad13150e1f42418d71ae6fdd05d544339df913887681ed7c
-size 1374

 version https://git-lfs.github.com/spec/v1
+oid sha256:577c1aac4c764781fe8a97c6627a7d8b9c870215d3dc373884852d17daa9e859
+size 1388

model-00001-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dddf3fdd44beb20160150391fc689f51ae51e68e3da93589227ad07a3c637225
+size 4977604760

model-00002-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e9996f62a04433689c7fd942cd7f9c2d8baba00400952024b732d4bf9e7c9b32
+size 4893894648

model-00003-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4b9ef75339a7c23bdc245573be342868ae5e5cc9ba1fcc32c3f11890fe28631
+size 4984871048

model-00004-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c2ed066ed1aca1c6c72a1982c20c4a4f2a8d60876598fb5fcd9648f7e03b6a4
+size 4976067496

model-00005-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02d27985f48d8dc38c5a492f922936d4cb64a897dfc32641645235a720eebf59
+size 4893776280

model-00006-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d4481ce2269fbe94b44799737e9240ab70e38db8fcda3183ae7c2df0597b17e
+size 4893894800

model-00007-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0cf75b65e52d3f3a9367f47f2bd03e0283f0ce82c4f0cf443eb8a6c916ece8f5
+size 4893894808

model-00008-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea3bb9b89b92f3d738b556c36afb0ffd29662a034608fcf7ac73f48474c462ce
+size 4484842920

model-00009-of-00009.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:220efb428e7b47b14df2fd0e7afb1da052f6d9a1f0b5ea7beeb440dc8247f949
+size 2491416704

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:042c113fd1f8f39103d5bcf865e4eacc3287a136416fab30c7b06185fde4ec12
+size 215614

quantization_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88102526dda3055b482da613fdd3bd449c2877728125c5f4fae365afdab6b59f
+size 574