--- license: mit --- # **Phi-4 OpenVINO INT4 Model** Note: This is unoffical version,just for test and dev. This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-4 . You can use it with the Intel OpenVINO SDK. ```bash optimum-cli export openvino --model .\Your Phi-4 path --task text-generation-with-past --weight-format int4 --sym --group-size 128 --ratio 0.6 --sym --trust-remote-code .\Your output Phi-4 OpenVINO location ``` ## **Sample Code** ```python from transformers import AutoConfig, AutoTokenizer from optimum.intel.openvino import OVModelForCausalLM model_dir = 'Your Phi-4 OpenVINO Path' ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""} ov_model = OVModelForCausalLM.from_pretrained( model_dir, device='GPU', ov_config=ov_config, config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True), trust_remote_code=True, ) tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) tokenizer_kwargs = {"add_special_tokens": False} prompt = "<|user|>\nI have $20,000 in my savings account, where I receive a 4% profit per year and payments twice a year. Can you please tell me how long it will take for me to become a millionaire? Also, can you please explain the math step by step as if you were explaining it to an uneducated person?\n<|end|><|assistant|>\n" input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs) answer = ov_model.generate(**input_tokens, max_new_tokens=1024) tok.batch_decode(answer, skip_special_tokens=True)[0] ```