Model Details

This model is an int4 model with group_size 128 and symmetric quantization of THUDM/glm-4v-9b generated by intel/auto-round. Load the model with revision="dbb7900" to use AutoGPTQ format.

How To Use

INT4 Inference

import torch
from PIL import Image
from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM, AutoTokenizer
import requests

MODEL_PATH = "OPEA/glm-4v-9b-int4-sym-inc"
DEVICE = 'cuda'

tokenizer = AutoTokenizer.from_pretrained(
    "THUDM/glm-4v-9b",
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype="auto",
    trust_remote_code=True,
    device_map=DEVICE,
    ##revision="dbb7900" ##AutoGPTQ format
).to(DEVICE).eval()

content = '描述这张图片'
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"

# Preparation for inference
image = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": content}],
                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
                                       return_dict=True)  # chat mode

inputs = inputs.to(device)
gen_kwargs = {"max_length": 2500, "do_sample": False, "top_k": 1}
with torch.no_grad():
  outputs = model.generate(**inputs, **gen_kwargs)
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
  print(tokenizer.decode(outputs[0]))

##INT4:
## 这是一张动画电影《彼得兔》的剧照。图片中,主角彼得兔穿着一件蓝色的外套、棕色的马甲和淡黄色的裤子,站在一条通往乡村小屋的小路上。他背后是一座石头房子,周围是绿色的草地和五颜六色的
## 花朵。远处可以看到连绵起伏的山脉。

##BF16:
## 这是一张描绘了动画角色彼得兔站在乡村小路上的图片。他穿着一件蓝色的外套、棕色的背心和卡其色的裤子,看起来非常正式。他的背后是一条蜿蜒的小路,两旁是五颜六色的花朵和绿色的草地。在小
## 路的尽头是一座古老的石屋,周围环境宁静而美丽。

image_url = "http://images.cocodataset.org/train2017/000000411975.jpg"
content = "图片中的棒球场上有多少人?"
##INT4:
## 图片中有四个人,其中两个人拿着棒球棍,一个人站着,一个人蹲着。

##BF16:
## 图片中有四个人,其中两个是弯腰捡东西的孩子,一个可能是老师或者家长,还有一个是拍摄者。

image_url = "https://intelcorp.scene7.com/is/image/intelcorp/processor-overview-framed-badge:1920-1080?wid=480&hei=270"
content = "这张图片代表哪家公司?"
##INT4:
## 这张图片代表英特尔公司,Intel Inside是英特尔公司为其处理器产品所采用的营销品牌,自1991年推出以来,这个标志已经成为高性能计算的代名词,并且出现在了众多搭载英特尔处理器的电脑上。

##BF16:
## 这张图片代表英特尔公司,Intel Inside是英特尔公司为其处理器产品所采用的营销计划。自1991年推出以来,这个计划已经成为了个人电脑性能的代名词,意味着搭载Intel Inside处理器的电脑能够提
## 供更好的性能和可靠性。英特尔公司是全球领先的半导体芯片制造商之一,总部位于美国加利福尼亚州圣克拉拉。

Generate the model

Here is the sample command to reproduce the model.

pip install auto-round
auto-round-mllm \
--model THUDM/glm-4v-9b \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsample 512 \
--seqlen 2048 \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
25
Safetensors
Model size
6.84B params
Tensor type
I32
·
BF16
·
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for OPEA/glm-4v-9b-int4-sym-inc

Base model

THUDM/glm-4v-9b
Quantized
(4)
this model

Dataset used to train OPEA/glm-4v-9b-int4-sym-inc

Collection including OPEA/glm-4v-9b-int4-sym-inc