mo137
/

Amethyst-13B-Mistral-8bpw-hb8-exl2

Text Generation

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mo137 commited on Oct 20, 2023

Commit

761df1a

·

1 Parent(s): ed80501

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
+tags:
+- exllamav2
+- exl2
+- Text Generation
+- not-for-all-audiences
+- nsfw
+- Transformers
+- llama
+- text-generation-inference
 ---
+# Amethyst 13B Mistral - EXL2 - 8bpw, hb8
+- Model creator: [Undi](https://huggingface.co/Undi95)
+- Original model: [Amethyst 13B Mistral](https://huggingface.co/Undi95/Amethyst-13B-Mistral)
+## Description
+- 8 bits per weight.
+- 8 bits "for the lm_head (output) layer of the model," instead of the typical 6.
+- Works fine with 24 GB VRAM and no flash attention v2 under Windows.
+- For me runs at about 64% of the 4-bit GPTQ speed.
+I converted the model using the convert.py script from the exllamav2 repo:
+https://github.com/turboderp/exllamav2
+Its documentation:
+https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
+Measuring the model took 51 minutes, converting it 18 minutes.
+I used the WikiText-2-v1 dataset for calibration:
+https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet