Remek commited on
Commit
f14a098
1 Parent(s): 6592e4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -26,6 +26,8 @@ Activations are quantized with a symmetric dynamic per-token scheme, computing a
26
  Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
27
  Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
28
 
 
 
29
  ## Use with vLLM
30
 
31
  This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
 
26
  Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
27
  Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
28
 
29
+ **DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!**
30
+
31
  ## Use with vLLM
32
 
33
  This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.