Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,32 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
+
tags:
|
4 |
+
- exllamav2
|
5 |
+
- exl2
|
6 |
+
- Text Generation
|
7 |
+
- not-for-all-audiences
|
8 |
+
- nsfw
|
9 |
+
- Transformers
|
10 |
+
- llama
|
11 |
+
- text-generation-inference
|
12 |
---
|
13 |
+
|
14 |
+
# Amethyst 13B Mistral - EXL2 - 8bpw, hb8
|
15 |
+
- Model creator: [Undi](https://huggingface.co/Undi95)
|
16 |
+
- Original model: [Amethyst 13B Mistral](https://huggingface.co/Undi95/Amethyst-13B-Mistral)
|
17 |
+
|
18 |
+
## Description
|
19 |
+
- 8 bits per weight.
|
20 |
+
- 8 bits "for the lm_head (output) layer of the model," instead of the typical 6.
|
21 |
+
- Works fine with 24 GB VRAM and no flash attention v2 under Windows.
|
22 |
+
- For me runs at about 64% of the 4-bit GPTQ speed.
|
23 |
+
|
24 |
+
I converted the model using the convert.py script from the exllamav2 repo:
|
25 |
+
https://github.com/turboderp/exllamav2
|
26 |
+
Its documentation:
|
27 |
+
https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
|
28 |
+
|
29 |
+
Measuring the model took 51 minutes, converting it 18 minutes.
|
30 |
+
|
31 |
+
I used the WikiText-2-v1 dataset for calibration:
|
32 |
+
https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet
|