File size: 3,187 Bytes
df07afe 46ebe06 df07afe 46ebe06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
license: llama3.1
tags:
- llmcompressor
- GPTQ
datasets:
- openerotica/erotiquant3
base_model:
- ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3
---
<p align="center">
<img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
</p>
<p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a> | <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a> | <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
<hr>
# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
## Quantization Settings
| **Attribute** | **Value** |
|---------------------------------|------------------------------------------------------------------------------------|
| **Algorithm** | GPTQ |
| **Layers** | Linear |
| **Weight Scheme** | W4A16 |
| **Group Size** | 128 |
| **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
| **Calibration Sequence Length** | 4096 |
| **Calibration Samples** | 512 |
### Dataset Preprocessing
The dataset was preprocessed with the following steps:
1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
2. Convert the structured conversations into a tokenized format using the model's tokenizer.
3. Filter out sequences shorter than 4096 tokens.
4. Shuffle and select 512 samples for calibration.
## Quantization Process
View the shell and python script used to quantize this model.
4 A40s with 300gb of ram was rented on runpod.
Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...)
- [compress.sh](./compress.sh)
- [compress.py](./compress.py)
## Acknowledgments
- Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3)
- Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
- LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
- Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG) |