Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1112

Bug in Parameters count for GPTQ quantized model

#1068

by Qubitium - opened Jan 10

Discussion

Qubitium

Jan 10

@alozowski I think I found a bug that is related to gptq and likely other quantized models in OpenLLM leaderboard. The parameters count appears off by a factor of 4? This gptq model of 3.2-1B instruct but filter will not show this model until I set to ~6B in the UI.

Leaderboard UI: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C6&search=modelcloud
Model: https://huggingface.co/ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1

alozowski

Open LLM Leaderboard org Jan 10

Hi @Qubitium ,

Thanks for opening the discussion here! Currently, the Leaderboard calculates the number of parameters for GPTQ models using this method (line 118). Could you suggest any improvements to this calculation? It would be appreciated

Qubitium

Jan 11

@alozowski The current code assumes a factor of 8 which means it assumes gptq quants are 4bits (int4), and 8 of int4 packed into one int32. gptq can be various bits including 2, 3, 4, 8. So this is one bug.

But even if the current code does on surface correctly calculate the 4bit refactor I still don't know why the end-value is ~4x more than reality. I may have more time on Monday to fix this if you haven't fixed this already.

Qubitium

Jan 20

@alozowski We recommend the following:

If LLM benchmark is using HF for inference, please import and use gptqmodel so latest Marlin kernel be auto-selected for faster results.
Import gptqmodel and use the utiil code to accurately calculate the params:

https://github.com/ModelCloud/GPTQModel/blob/main/gptqmodel/utils/tensor.py

Call the above util feature in the param loop and pass in the tensor name, tensor.shape, and also gptq bit value. Then sum everything. This will correctly return the proper gptq model parameters.

Qubitium

Jan 20

import os.path

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

from gptqmodel import QuantizeConfig
from gptqmodel.utils.tensor import tensor_parameters

model_id = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1"
file_path = hf_hub_download(model_id, filename="model.safetensors")
config_path = hf_hub_download(model_id, filename="config.json")
safetensors_obj = load_file(file_path)
quantize_config = QuantizeConfig.from_pretrained(os.path.dirname(config_path))

total_params = 0

for name, tensor in safetensors_obj.items():
    param_count = tensor_parameters(name, tensor.shape, bits=quantize_config.bits)
    total_params += param_count

print(f"total_params: {total_params/1e9} B")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment