Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,8 @@ Activations are quantized with a symmetric dynamic per-token scheme, computing a
|
|
26 |
Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
|
27 |
Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
|
28 |
|
|
|
|
|
29 |
## Use with vLLM
|
30 |
|
31 |
This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
|
|
|
26 |
Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
|
27 |
Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
|
28 |
|
29 |
+
**DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!**
|
30 |
+
|
31 |
## Use with vLLM
|
32 |
|
33 |
This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
|