OPEA
/

DeepSeek-R1-int2-gptq-sym-inc

Model card Files Files and versions Community

cicdatopea commited on 10 days ago

Commit

cfbb6b2

·

verified ·

1 Parent(s): 22466da

fix typo

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -7,13 +7,14 @@ base_model:
 ---
 ## Model Details
-This model is an int2 model with group_size  64 and symmetric quantization of [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.  Some layers are fallback to 4/16 bits. Refer to  Section "Generate the model" for more details of mixed bits setting.
-Please follow the license of the original model. This model could **NOT** run on other severing frameworks.
 ## How To Use
@@ -240,7 +241,8 @@ model = AutoModelForCausalLM.from_pretrained(
     torch_dtype=torch.bfloat16,
     trust_remote_code=True,
     device_map="cpu",
-    quantization_config=quantization_config
 )
@@ -416,7 +418,6 @@ model = AutoModelForCausalLM.from_pretrained(
     torch_dtype=torch.float16,
     trust_remote_code=True,
     device_map=device_map,
-    revision="080ef2d"
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -504,7 +505,6 @@ from auto_round import AutoRound
 autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, bits=2, group_size=64,
                       iters=1000, batch_size=4, seqlen=512, nsamples=512, enable_torch_compile=False,
                    )

 ---
 ## Model Details
+This model is an int2 model with group_size 64 and symmetric quantization of [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm. We recommend using a mixed version [OPEA/DeepSeek-R1-int2-mixed-sym-inc](https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc) for better accuracy
+Please follow the license of the original model.
 ## How To Use
     torch_dtype=torch.bfloat16,
     trust_remote_code=True,
     device_map="cpu",
+    quantization_config=quantization_config,
+    revision="080ef2d"
 )
     torch_dtype=torch.float16,
     trust_remote_code=True,
     device_map=device_map,
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, bits=2, group_size=64,
                       iters=1000, batch_size=4, seqlen=512, nsamples=512, enable_torch_compile=False,
                    )