feihu.hf commited on
Commit
f2e402c
1 Parent(s): 49dc252

update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -69,6 +69,12 @@ generated_ids = [
69
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
70
  ```
71
 
 
 
 
 
 
 
72
  ## Citation
73
 
74
  If you find our work helpful, feel free to give us a cite.
 
69
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
70
  ```
71
 
72
+ ## Benchmark and Speed
73
+
74
+ To compare the generation performance between bfloat16 (bf16) and quantized models such as GPTQ-Int8, GPTQ-Int4, and AWQ, please consult our [Benchmark of Quantized Models](https://qwen.readthedocs.io/en/latest/benchmark/quantization_benchmark.html). This benchmark provides insights into how different quantization techniques affect model performance.
75
+
76
+ For those interested in understanding the inference speed and memory consumption when deploying these models with either ``transformer`` or ``vLLM``, we have compiled an extensive [Speed Benchmark](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
77
+
78
  ## Citation
79
 
80
  If you find our work helpful, feel free to give us a cite.