upload auto_gptq format

Files changed (13) hide show

.gitattributes +9 -0
README.md +186 -3
added_tokens.json +3 -0
config.json +3 -0
generation_config.json +3 -0
merges.txt +0 -0
model.safetensors +3 -0
quantization_config.json +3 -0
quantize_config.json +3 -0
special_tokens_map.json +3 -0
tokenizer.json +3 -0
tokenizer_config.json +3 -0
vocab.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text
+added_tokens.json filter=lfs diff=lfs merge=lfs -text
+config.json filter=lfs diff=lfs merge=lfs -text
+generation_config.json filter=lfs diff=lfs merge=lfs -text
+quantize_config.json filter=lfs diff=lfs merge=lfs -text
+quantization_config.json filter=lfs diff=lfs merge=lfs -text
+special_tokens_map.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,186 @@
----
-license: apache-2.0
----

+## Model Details
+This model is an int4 model with group_size 128 of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round)
+## How To Use
+### INT4 Inference(CPU/HPU/CUDA)
+CPU requires auto-round version>0.3.1
+```python
+from auto_round import AutoRoundConfig ##must import for auto-round format
+from transformers import AutoModelForCausalLM,AutoTokenizer
+quantized_model_dir = "OPEA/Qwen2.5-0.5B-Instruct-int4-inc"
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
+model = AutoModelForCausalLM.from_pretrained(
+    quantized_model_dir,
+    torch_dtype='float16',
+    device_map="auto",
+)
+##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
+##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
+##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU
+prompt = "There is a girl who likes adventure,"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=200,  ##change this to align with the official usage
+    do_sample=False  ##change this to align with the official usage
+)
+generated_ids = [
+output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+prompt = "There is a girl who likes adventure,"
+## INT4:
+"""That's great to hear! What kind of adventure does the girl like? Is there anything specific she enjoys doing or exploring?"""
+## BF16:
+"""That's great to hear! What kind of adventure does the girl like? Is there anything specific she enjoys doing or exploring?"""
+prompt = "9.11和9.8哪个数字大"
+#INT4:
+"""
+要比较9.11和9.8的大小，我们可以按照以下步骤进行：
+1. 首先，将两个数都转换为相同的小数形式。这里我们使用小数点前的零来方便比较。
+   9.11 = 9.1100 (保留两位小数)
+   9.8 = 9.8000 (保留两位小数)
+2. 现在，比较这两个小数：
+   - 第一位：9 和 9 相等。
+   - 第二位：第一位是相同的，都是1。
+   - 第三位：第一个数是1，第二个数是8，所以8 > 1。
+因此，9.8大于9.11。
+最终答案：9.8更大。
+"""
+##BF16:
+"""
+要比较9.11和9.8的大小，我们可以按照以下步骤进行：
+1. 首先，将两个数都转换为相同的小数形式。这里我们使用小数点前的零来方便比较。
+   9.11 = 9.1100 (保留两位小数)
+   9.8 = 9.8000 (保留两位小数)
+2. 现在，比较这两个小数：
+   - 第一位：9 和 9 相等。
+   - 第二位：第一位是相同的，都是1。
+   - 第三位：第一个数是1，第二个数是8，所以8 > 1。
+因此，9.8大于9.11。
+最终答案：9.8更大。
+"""
+prompt = "Once upon a time,"
+##INT4:
+"""I'm sorry, but I don't understand what you're asking me to do or what information you want me to provide. Could you please clarify your question or provide more context? I'd be happy to help if you can give me all the information you need."""
+##BF16:
+"""I'm sorry, but I don't understand what you're asking me to do or what information you want me to provide. Could you please clarify your question or provide more context? I'd be happy to help if you can give me all the information you need."""
+prompt = "请简短介绍一下阿里巴巴公司"
+##INT4:
+"""阿里巴巴集团是全球领先的电子商务和云计算服务提供商，成立于1999年。该公司总部位于中国杭州，并在多个国家和地区设有办事处和运营中心。阿里巴巴集团的业务包括在线零售、移动支付、云计算、人工智能等。阿里巴巴集团是中国最大的电子商务平台之一，也是全球最大的电商平台之一。阿里巴巴集团还拥有众多子公司和品牌，如淘宝、天猫、菜鸟网络等。阿里巴巴集团在全球范围内拥有超过20亿活跃用户，每年销售额超过3500亿美元。阿里巴巴集团致力于通过创新和智能化技术推动商业变革，为消费者提供更便捷、更个性化的购物体验。"""
+##BF16:
+"""阿里巴巴集团是全球领先的电子商务和云计算服务提供商，成立于1999年。该公司总部位于中国杭州，并在多个国家和地区设有办事处和运营中心。阿里巴巴集团的业务包括在线零售、移动支付、云计算、人工智能等。阿里巴巴集团是中国最大的电子商务平台之一，也是全球最大的电商平台之一。阿里巴巴集团还拥有众多子公司和品牌，如淘宝、天猫、菜鸟网络等。阿里巴巴集团在全球范围内拥有超过20亿活跃用户，每年销售额超过3500亿美元。阿里巴巴集团致力于通过创新和智能化技术推动商业变革，为消费者提供更便捷、更个性化的购物体验。"""
+```
+### Evaluate the model
+pip3 install lm-eval==0.4.5
+```bash
+auto-round --model "OPEA/Qwen2.5-0.5B-Instruct-int4-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
+```
+| Metric                                     |  BF16  |  INT4  |
+| :----------------------------------------- | :----: | :----: |
+| Avg                                        | 0.4229 | 0.4124 |
+| leaderboard_ifeval inst_level_strict_acc   | 0.3501 | 0.3441 |
+| leaderboard_ifeval prompt_level_strict_acc | 0.2107 | 0.2218 |
+| leaderboard_mmlu_pro 5 shots               | 0.1877 | 0.1678 |
+| mmlu                                       | 0.4582 | 0.4434 |
+| cmmlu                                      | 0.5033 | 0.4542 |
+| ceval-valid                                | 0.5327 | 0.4918 |
+| gsm8k 5 shots                              | 0.2146 | 0.2267 |
+| lambada_openai                             | 0.4968 | 0.4692 |
+| hellaswag                                  | 0.4062 | 0.3927 |
+| winogrande                                 | 0.5541 | 0.5675 |
+| piqa                                       | 0.7051 | 0.7035 |
+| truthfulqa_mc1                             | 0.2693 | 0.2815 |
+| openbookqa                                 | 0.2400 | 0.2200 |
+| boolq                                      | 0.6783 | 0.6471 |
+| arc_easy                                   | 0.6566 | 0.6595 |
+| arc_challenge                              | 0.3020 | 0.3072 |
+### Generate the model
+Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
+```bash
+auto-round \
+--model  Qwen/Qwen2.5-0.5B-Instruct \
+--device 0 \
+--group_size 128 \
+--nsamples 512 \
+--bits 4 \
+--iter 1000 \
+--disable_eval \
+--model_dtype "fp16" \
+--format 'auto_gptq,auto_round' \
+--output_dir "./tmp_autoround"
+```
+## Ethical Considerations and Limitations
+The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
+Therefore, before deploying any applications of the model, developers should perform safety testing.
+## Caveats and Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
+Here are a couple of useful links to learn more about Intel's AI software:
+- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
+## Disclaimer
+The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
+## Cite
+@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
+[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:58b54bbe36fc752f79a24a271ef66a0a0830054b4dfad94bde757d851968060b
+size 605

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:96f5d8e1d262852583bf8492ba2c4b8d101db7d0d60f8d3e6c7a42f9b36aa4dc
+size 1367

generation_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dc30d5b7f022dcbfaaef3e55340642208a3b0436214346caf1c522c009f699d
+size 242

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e0209a213dc03574cf5f5052e2e4c8726a196bee189b27aea69fb5bcc04cb26
+size 459946568

quantization_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d603e56eb4bb154a31dc23a83f243fc179aeb8631b8a0639837c3d06b06e8d8b
+size 569

quantize_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:40a1be12405e831fd26943690d41ea6d85bc4d452305943a8389fc54bba336c9
+size 559

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76862e765266b85aa9459767e33cbaf13970f327a0e88d1c65846c2ddd3a1ecd
+size 613

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7e88129d9769a0b14b1587a7d5e829fe93ac0e1511636471fdfc0811951418e6
+size 7306

vocab.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca10d7e9fb3ed18575dd1e277a2579c16d108e32f27439684afa0e10b1440910
+size 2776833