Phi-4 GPTQ (4-bit Quantized)

Model Description

This is a 4-bit quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.

Base Model: Phi-4
Quantization: bnb (4-bit)
Format: safetensors
Tokenizer: Uses standard vocab.json and merges.txt

Intended Use

Fast inference with minimal VRAM usage
Deployment in resource-constrained environments
Optimized for low-latency text generation

Model Details

Attribute	Value
Model Name	Phi-4 GPTQ
Quantization	4-bit (GPTQ)
File Format	`.safetensors`
Tokenizer	`phi-4-tokenizer.json`
VRAM Usage	~X GB (depending on batch size)

Downloads last month: 16

Safetensors

Model size

8.06B params

Tensor type

F32

·

FP16

·

U8

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for fhamborg/phi-4-4bit-bnb

Base model

microsoft/phi-4

Quantized

(115)

this model