File size: 1,762 Bytes
45d67dd d966725 45d67dd 2df1b7a 45d67dd d966725 2df1b7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Official [AQLM](https://arxiv.org/abs/2401.06118) quantization of `meta-llama/Llama-2-7b-hf`.
For this quantization, we used 2 codebooks of 8 bits.
Selected evaluation results for this and other models:
| Model | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link |
|------------|-------------|----------------|----------------|--------------------------------------------------------------------------|
| Llama-2-7b | 1x16 | 5.92 | 2.4 | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf) |
| Llama-2-7b (THIS) | 2x8 | 6.69 | 2.2 | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-2x8-hf) |
| Llama-2-7b | 8x8 | 7.83 | 2.2 | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-8x8-hf) |
| Llama-2-13b| 1x16 | 5.41 | 4.1 | [Link](https://huggingface.co/BlackSamorez/Llama-2-13b-AQLM-2Bit-1x16-hf)|
| Llama-2-70b| 1x16 | 3.96 | 18.8 | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf)|
| Llama-2-70b| 2x8 | 4.83 | 18.2 | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-2x8-hf) |
| Mixtral-8x7b| 1x16 | 4.37 | 12.6 | [Link](https://huggingface.co/BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf)|
**UPD** (20.02.2024).
We applied global finetuning on top of quantized model and improved results compared to first revision.
To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM). |