gpt2-large quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/gpt2-large-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/gpt2-AutoGPTQ-4bit-128g 26.5000 25.1875 1.3125
smpanaro/gpt2-medium-AutoGPTQ-4bit-128g 19.1719 18.4739 0.698
smpanaro/gpt2-large-AutoGPTQ-4bit-128g 16.6875 16.4541 0.2334
smpanaro/gpt2-xl-AutoGPTQ-4bit-128g 14.9297 14.7951 0.1346
Wikitext perplexity measured as in the huggingface docs, lower is better
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train smpanaro/gpt2-large-AutoGPTQ-4bit-128g

Collection including smpanaro/gpt2-large-AutoGPTQ-4bit-128g