pythia-160m quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-160m-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/pythia-160m-AutoGPTQ-4bit-128g 33.4375 23.3024 10.1351
Wikitext perplexity measured as in the huggingface docs, lower is better
Downloads last month
20
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train smpanaro/pythia-160m-AutoGPTQ-4bit-128g

Collection including smpanaro/pythia-160m-AutoGPTQ-4bit-128g