ISTA-DASLab
/

Llama-2-7b-AQLM-2Bit-8x8-hf

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

Llama-2-7b-AQLM-2Bit-8x8-hf / README.md

BlackSamorez's picture

Update README.md

fcb7828 verified 10 months ago

|

history blame contribute delete

1.78 kB

	Official [AQLM](https://arxiv.org/abs/2401.06118) quantization of `meta-llama/Llama-2-7b-hf`.

	For this quantization, we used 2 codebooks of 8 bits.

	Selected evaluation results for this and other models:

	\| Model \| AQLM scheme \| WikiText 2 PPL \| Model size, Gb \| Hub link \|
	\|------------\|-------------\|----------------\|----------------\|--------------------------------------------------------------------------\|
	\| Llama-2-7b \| 1x16 \| 5.92 \| 2.4 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf) \|
	\| Llama-2-7b \| 2x8 \| 6.69 \| 2.2 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-7b-AQLM-2Bit-2x8-hf) \|
	\| Llama-2-7b (THIS) \| 8x8 \| 6.61 \| 2.2 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-7b-AQLM-2Bit-8x8-hf) \|
	\| Llama-2-13b\| 1x16 \| 5.22 \| 4.1 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-13b-AQLM-2Bit-1x16-hf)\|
	\| Llama-2-70b\| 1x16 \| 3.83 \| 18.8 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-70b-AQLM-2Bit-1x16-hf)\|
	\| Llama-2-70b\| 2x8 \| 4.21 \| 18.2 \| [Link](https://huggingface.co/ISTA-DASLab/Llama-2-70b-AQLM-2Bit-2x8-hf) \|
	\| Mixtral-8x7b\| 1x16 \| 3.35 \| 12.6 \| [Link](https://huggingface.co/ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf)\|
	\| Mixtral-8x7b-Instruct\| 1x16 \| - \| 12.6 \| [Link](https://huggingface.co/ISTA-DASLab/Mixtral-8x7B-Instruct-v0_1-AQLM-2Bit-1x16-hf)\|

	To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).