Phi-4 ZeroWw quantizations

  • For q4_k: output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.
  • For q5_k, q6_k, q8_0 and q8_0 --pure: output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.
  • BF16 and imatrix for q5_k, q6_k available.
Quant type File Size Vram*
phi-4.q8.q4 4 bits per weight 9.43 GB 12.9 GB
phi-4.bf16.q5 5 bits per weight 11.9 GB 14.2 GB
phi-4.bf16.q5.im 5 bits per weight 11.9 GB 14.2 GB
phi-4.bf16.q6 6 bits per weight 13.2 GB 15.5 GB
phi-4.bf16.q6.im 6 bits per weight 13.2 GB 15.5 GB
phi-4.bf16.q8 8 bits per weight 16.5 GB 18.5 GB
phi-4.bf16.q8p 8 bits per weight 15.6 GB 18.6 GB
phi-4.bf16 16 bits per weight 29.3 GB tbd

*approximate value at 16k context, FP16 cache.


ZeroWw quantization: huggingface.co/RobertSinclair

python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf

llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 phi-4.bf16.gguf phi-4.q8.q4.gguf q4_k
llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q5.gguf q5_k
llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q5.im.gguf q5_k
llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q6.gguf q6_k
llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q6.im.gguf q6_k
llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q8.gguf q8_0
llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8p.gguf q8_0

Phi-4 Model Card

Phi-4 Technical Report

Model Summary

Developers Microsoft Research
Description phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures
Architecture 14B parameters, dense decoder-only Transformer model
Context length 16384 tokens

Usage

Input Formats

Given the nature of the training data, phi-4 is best suited for prompts using the chat format as follows:

<|im_start|>system<|im_sep|>
You are a medieval knight and must provide explanations to modern people.<|im_end|>
<|im_start|>user<|im_sep|>
How should I explain the Internet?<|im_end|>
<|im_start|>assistant<|im_sep|>
Downloads last month
532
GGUF
Model size
14.7B params
Architecture
phi3
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cmh/phi-4_ZeroWw

Base model

microsoft/phi-4
Quantized
(124)
this model