speakleash
/

Bielik-11B-v2.2-Instruct-W8A8

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions Community

Remek commited on Aug 27

Commit

f14a098

•

1 Parent(s): 6592e4c

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -26,6 +26,8 @@ Activations are quantized with a symmetric dynamic per-token scheme, computing a
 Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
 Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
 ## Use with vLLM
 This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.

 Linear scaling factors are computed via by minimizing the mean squarred error (MSE). The SmoothQuant algorithm is used to alleviate outliers in the activations, whereas rhe GPTQ algorithm is applied for quantization.
 Both algorithms are implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
+**DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!**
 ## Use with vLLM
 This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.