Bug in Parameters count for GPTQ quantized model

#1068
by Qubitium - opened

@alozowski I think I found a bug that is related to gptq and likely other quantized models in OpenLLM leaderboard. The parameters count appears off by a factor of 4? This gptq model of 3.2-1B instruct but filter will not show this model until I set to ~6B in the UI.

Leaderboard UI: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C6&search=modelcloud
Model: https://huggingface.co/ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1

Open LLM Leaderboard org

Hi @Qubitium ,

Thanks for opening the discussion here! Currently, the Leaderboard calculates the number of parameters for GPTQ models using this method (line 118). Could you suggest any improvements to this calculation? It would be appreciated

@alozowski The current code assumes a factor of 8 which means it assumes gptq quants are 4bits (int4), and 8 of int4 packed into one int32. gptq can be various bits including 2, 3, 4, 8. So this is one bug.

But even if the current code does on surface correctly calculate the 4bit refactor I still don't know why the end-value is ~4x more than reality. I may have more time on Monday to fix this if you haven't fixed this already.

@alozowski We recommend the following:

  1. If LLM benchmark is using HF for inference, please import and use gptqmodel so latest Marlin kernel be auto-selected for faster results.
  2. Import gptqmodel and use the utiil code to accurately calculate the params:

https://github.com/ModelCloud/GPTQModel/blob/main/gptqmodel/utils/tensor.py

Call the above util feature in the param loop and pass in the tensor name, tensor.shape, and also gptq bit value. Then sum everything. This will correctly return the proper gptq model parameters.

import os.path

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

from gptqmodel import QuantizeConfig
from gptqmodel.utils.tensor import tensor_parameters

model_id = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1"
file_path = hf_hub_download(model_id, filename="model.safetensors")
config_path = hf_hub_download(model_id, filename="config.json")
safetensors_obj = load_file(file_path)
quantize_config = QuantizeConfig.from_pretrained(os.path.dirname(config_path))

total_params = 0

for name, tensor in safetensors_obj.items():
    param_count = tensor_parameters(name, tensor.shape, bits=quantize_config.bits)
    total_params += param_count

print(f"total_params: {total_params/1e9} B")

Sign up or log in to comment