OPT-13B - GPTQ

Model creator: Meta AI
Original model: OPT-13B

The model published in this repo was quantized to 4bit using AutoGPTQ.

Quantization details

All quantization parameters were taken from GPTQ paper.

GPTQ calibration data consisted of 128 random 2048 token segments from the C4 dataset.

The grouping size used for quantization is equal to 128.

How to use this GPTQ model from Python code

Install the necessary packages

pip install accelerate==0.26.1 datasets==2.16.1 dill==0.3.7 gekko==1.0.6 multiprocess==0.70.15 peft==0.7.1 rouge==1.0.1 sentencepiece==0.1.99
git clone https://github.com/upunaprosk/AutoGPTQ
cd AutoGPTQ
pip install -v .

Recommended transformers version: 4.35.2.

You can then use the following code


from transformers import AutoTokenizer, TextGenerationPipeline,AutoModelForCausalLM
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
pretrained_model_dir = "iproskurina/opt-13b-gptq-4bit"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(pretrained_model_dir, device="cuda:0", model_basename="model")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("auto-gptq is")[0]["generated_text"])

LICENSE

Run the model with GPTQModel

GPTQModel package: https://github.com/ModelCloud/GPTQModel

pip install -v gptqmodel=="1.8.0" --no-build-isolation
from gptqmodel import GPTQModel

model_id = 'iproskurina/opt-13b-GPTQ-4bit-g128'
model = GPTQModel.load(model_id)
result = model.generate("Uncovering deep insights")[0] # tokens
print(model.tokenizer.decode(result)) # string output

iproskurina
/

opt-13b-GPTQ-4bit-g128

OPT-13B - GPTQ

How to use this GPTQ model from Python code

Install the necessary packages

You can then use the following code

Run the model with GPTQModel

Model tree for iproskurina/opt-13b-GPTQ-4bit-g128

Collection including iproskurina/opt-13b-GPTQ-4bit-g128

Quantized LLMs