GusPuffy
/

Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ

+---
+license: llama3.1
+tags:
+- llmcompressor
+- GPTQ
+datasets:
+- openerotica/erotiquant3
+---
+<p align="center">
+<img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
+</p>
+<p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a>  |  <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a>  |  <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
+<hr>
+# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
+This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
+## Quantization Settings
+| **Attribute**                   | **Value**                                                                          |
+|---------------------------------|------------------------------------------------------------------------------------|
+| **Algorithm**                   | GPTQ                                                                               |
+| **Layers**                      | Linear                                                                             |
+| **Weight Scheme**               | W4A16                                                                              |
+| **Group Size**                  | 128                                                                                |
+| **Calibration Dataset**         | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
+| **Calibration Sequence Length** | 4096                                                                               |
+| **Calibration Samples**         | 512                                                                                |
+### Dataset Preprocessing
+The dataset was preprocessed with the following steps:
+1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
+2. Convert the structured conversations into a tokenized format using the model's tokenizer.
+3. Filter out sequences shorter than 4096 tokens.
+4. Shuffle and select 512 samples for calibration.
+## Quantization Process
+View the shell and python script used to quantize this model.
+4 A40s with 300gb of ram was rented on runpod.
+Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...)
+- [compress.sh](./compress.sh)
+- [compress.py](./compress.py)
+## Acknowledgments
+- Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3)
+- Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
+- LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
+- Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
+![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)