Phi-4 GPTQ (4-bit Quantized)

Model

Model Description

This is a 4-bit quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.

  • Base Model: Phi-4
  • Quantization: bnb (4-bit)
  • Format: safetensors
  • Tokenizer: Uses standard vocab.json and merges.txt

Intended Use

  • Fast inference with minimal VRAM usage
  • Deployment in resource-constrained environments
  • Optimized for low-latency text generation

Model Details

Attribute Value
Model Name Phi-4 GPTQ
Quantization 4-bit (GPTQ)
File Format .safetensors
Tokenizer phi-4-tokenizer.json
VRAM Usage ~X GB (depending on batch size)
Downloads last month
16
Safetensors
Model size
8.06B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for fhamborg/phi-4-4bit-bnb

Base model

microsoft/phi-4
Quantized
(115)
this model