--- license: llama3.1 tags: - llmcompressor - GPTQ datasets: - openerotica/erotiquant3 base_model: - ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3 ---

Sentient Simulations Plumbob

[🏠Sentient Simulations] | [Discord] | [Patreon]

# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor). ## Quantization Settings | **Attribute** | **Value** | |---------------------------------|------------------------------------------------------------------------------------| | **Algorithm** | GPTQ | | **Layers** | Linear | | **Weight Scheme** | W4A16 | | **Group Size** | 128 | | **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) | | **Calibration Sequence Length** | 4096 | | **Calibration Samples** | 512 | ### Dataset Preprocessing The dataset was preprocessed with the following steps: 1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`). 2. Convert the structured conversations into a tokenized format using the model's tokenizer. 3. Filter out sequences shorter than 4096 tokens. 4. Shuffle and select 512 samples for calibration. ## Quantization Process View the shell and python script used to quantize this model. 4 A40s with 300gb of ram was rented on runpod. Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...) - [compress.sh](./compress.sh) - [compress.py](./compress.py) ## Acknowledgments - Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) - Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) - LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor) - Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims) ![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)