Sentient Simulations Plumbob

[🏠Sentient Simulations] | [Discord] | [Patreon]


Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ

This repository contains a 4 bit GPTQ-quantized version of the ArliAI Llama 3.1 70B model using llm-compressor.

Quantization Settings

Attribute Value
Algorithm GPTQ
Layers Linear
Weight Scheme W4A16
Group Size 128
Calibration Dataset openerotica/erotiquant3
Calibration Sequence Length 4096
Calibration Samples 512

Dataset Preprocessing

The dataset was preprocessed with the following steps:

  1. Extract and structure the conversation data using role-based templates (SYSTEM, USER, ASSISTANT).
  2. Convert the structured conversations into a tokenized format using the model's tokenizer.
  3. Filter out sequences shorter than 4096 tokens.
  4. Shuffle and select 512 samples for calibration.

Quantization Process

View the shell and python script used to quantize this model.

4 A40s with 300gb of ram was rented on runpod.

Quantization took approximately 11 hours with a total of $23.65 in compute costs. (And another $70 of me screwing up the quants like 10 times but anyways...)

Acknowledgments

patreon.PNG

Downloads last month
52
Safetensors
Model size
11.2B params
Tensor type
I64
·
I32
·
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ

Quantized
(5)
this model

Dataset used to train GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ