Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.1
|
3 |
+
tags:
|
4 |
+
- llmcompressor
|
5 |
+
- GPTQ
|
6 |
+
datasets:
|
7 |
+
- openerotica/erotiquant3
|
8 |
+
---
|
9 |
+
|
10 |
+
<p align="center">
|
11 |
+
<img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
|
12 |
+
</p>
|
13 |
+
<p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a> | <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a> | <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
|
14 |
+
<hr>
|
15 |
+
|
16 |
+
# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
|
17 |
+
|
18 |
+
This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
|
19 |
+
|
20 |
+
## Quantization Settings
|
21 |
+
|
22 |
+
| **Attribute** | **Value** |
|
23 |
+
|---------------------------------|------------------------------------------------------------------------------------|
|
24 |
+
| **Algorithm** | GPTQ |
|
25 |
+
| **Layers** | Linear |
|
26 |
+
| **Weight Scheme** | W4A16 |
|
27 |
+
| **Group Size** | 128 |
|
28 |
+
| **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
|
29 |
+
| **Calibration Sequence Length** | 4096 |
|
30 |
+
| **Calibration Samples** | 512 |
|
31 |
+
|
32 |
+
### Dataset Preprocessing
|
33 |
+
|
34 |
+
The dataset was preprocessed with the following steps:
|
35 |
+
1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
|
36 |
+
2. Convert the structured conversations into a tokenized format using the model's tokenizer.
|
37 |
+
3. Filter out sequences shorter than 4096 tokens.
|
38 |
+
4. Shuffle and select 512 samples for calibration.
|
39 |
+
|
40 |
+
## Quantization Process
|
41 |
+
|
42 |
+
View the shell and python script used to quantize this model.
|
43 |
+
|
44 |
+
4 A40s with 300gb of ram was rented on runpod.
|
45 |
+
|
46 |
+
Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...)
|
47 |
+
|
48 |
+
- [compress.sh](./compress.sh)
|
49 |
+
- [compress.py](./compress.py)
|
50 |
+
|
51 |
+
## Acknowledgments
|
52 |
+
|
53 |
+
- Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3)
|
54 |
+
- Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
|
55 |
+
- LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
|
56 |
+
- Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
|
57 |
+
|
58 |
+
![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)
|