README.md · ISTA-DASLab/Llama-2-7b-AQLM-2Bit-2x8-hf at 2df1b7a5cbb2a8b584eade2de5c2b4975072a644

Official AQLM quantization of meta-llama/Llama-2-7b-hf.

For this quantization, we used 2 codebooks of 8 bits.

Selected evaluation results for this and other models:

Model	AQLM scheme	WikiText 2 PPL	Model size, Gb	Hub link
Llama-2-7b	1x16	6.31	2.4	Link
Llama-2-7b (THIS)	2x8	7.98	2.2	Link
Llama-2-7b	8x8	7.83	2.2	Link
Llama-2-13b	1x16	5.41	4.1	Link
Llama-2-70b	1x16	3.96	18.8	Link
Llama-2-70b	2x8	4.83	18.2	Link
Mixtral-8x7b	1x16	4.37	12.6	Link

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.