Phi-3.5-vision-instruct-int4-ov

Description

This is microsoft/Phi-3.5-vision-instruct model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT4 using Activation Aware Quantization (AWQ) by NNCF.

Quantization Parameters

Weight compression was performed using nncf.compress_weights with the following parameters:

  • mode: INT4_ASYM
  • ratio: 1.0
  • group_size: 128
  • awq: True
  • dataset: contextual
  • num_samples: 32

Compatibility

The provided OpenVINO™ IR model is compatible with:

  • OpenVINO version 2025.0.0 and higher
  • Optimum Intel 1.21.0 and higher

Running Model Inference with Optimum Intel

  1. Install packages required for using Optimum Intel integration with the OpenVINO backend:
pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino_tokenizers openvino

pip install git+https://github.com/huggingface/optimum-intel.git
  1. Run model inference
from PIL import Image 
import requests 
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor, TextStreamer

model_id = "OpenVINO/Phi-3.5-vision-instruct-int4-ov"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

ov_model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "<|image_1|>\nWhat is unusual on this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image = Image.open(requests.get(url, stream=True).raw)

inputs = ov_model.preprocess_inputs(text=prompt, image=image, processor=processor)

generation_args = { 
    "max_new_tokens": 50, 
    "temperature": 0.0, 
    "do_sample": False,
    "streamer": TextStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)
} 

generate_ids = ov_model.generate(**inputs, 
  eos_token_id=processor.tokenizer.eos_token_id, 
  **generation_args
)

generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(generate_ids, 
  skip_special_tokens=True, 
  clean_up_tokenization_spaces=False)[0]

Limitations

Check the original model card for limitations.

Legal information

The original model is distributed under MIT license. More details can be found in original model card.

Downloads last month
47
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for OpenVINO/Phi-3.5-vision-instruct-int4-ov

Quantized
(12)
this model

Collection including OpenVINO/Phi-3.5-vision-instruct-int4-ov