GusPuffy commited on
Commit
df07afe
·
verified ·
1 Parent(s): 17cb2d3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ tags:
4
+ - llmcompressor
5
+ - GPTQ
6
+ datasets:
7
+ - openerotica/erotiquant3
8
+ ---
9
+
10
+ <p align="center">
11
+ <img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
12
+ </p>
13
+ <p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a> | <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a> | <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
14
+ <hr>
15
+
16
+ # Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
17
+
18
+ This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
19
+
20
+ ## Quantization Settings
21
+
22
+ | **Attribute** | **Value** |
23
+ |---------------------------------|------------------------------------------------------------------------------------|
24
+ | **Algorithm** | GPTQ |
25
+ | **Layers** | Linear |
26
+ | **Weight Scheme** | W4A16 |
27
+ | **Group Size** | 128 |
28
+ | **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
29
+ | **Calibration Sequence Length** | 4096 |
30
+ | **Calibration Samples** | 512 |
31
+
32
+ ### Dataset Preprocessing
33
+
34
+ The dataset was preprocessed with the following steps:
35
+ 1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
36
+ 2. Convert the structured conversations into a tokenized format using the model's tokenizer.
37
+ 3. Filter out sequences shorter than 4096 tokens.
38
+ 4. Shuffle and select 512 samples for calibration.
39
+
40
+ ## Quantization Process
41
+
42
+ View the shell and python script used to quantize this model.
43
+
44
+ 4 A40s with 300gb of ram was rented on runpod.
45
+
46
+ Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...)
47
+
48
+ - [compress.sh](./compress.sh)
49
+ - [compress.py](./compress.py)
50
+
51
+ ## Acknowledgments
52
+
53
+ - Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3)
54
+ - Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
55
+ - LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
56
+ - Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
57
+
58
+ ![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)