microsoft/phi-4 Quantized Models
Overview
This model applies GPTQ quantization to microsoft/phi-4 as the base model. It optimizes performance in Japanese environments by using Japanese text as calibration data.
- Model Variants:
- Base Model: microsoft/phi-4
- Model Size: 14,659,507,200 parameters
- Category: 10B≤ <30B
Quantization Parameters 🐝Link to W&B
- bits: 4 or 8
- group_size: 128
- perc_damp: 0.01
- desc_act: True
- use_exllama: False
- model_seqlen: 2048
Performance Evaluation
Evaluation results from Nejumi LLM Leaderboard 3 (W&B)
Blue: Original
Orange: 8bit
Green: 4bit
Benchmark Overall Results
Model | GLP Average | ALT Average | Overall Average |
---|---|---|---|
phi-4 Int4 | 0.5815 | 0.6953 | 0.6384 |
phi-4 Int8 | 0.5948 | 0.7015 | 0.6482 |
phi-4 Original | 0.5950 | 0.7005 | 0.6477 |
General Language Performance (GLP) Details
Subcategory | Int4 | Int8 | Original |
---|---|---|---|
Expression | 0.8567 | 0.8717 | 0.8583 |
Translation | 0.8458 | 0.8480 | 0.8457 |
Information Retrieval | 0.8780 | 0.8806 | 0.8809 |
Reasoning | 0.6400 | 0.5850 | 0.6550 |
Mathematical Reasoning | 0.5400 | 0.5967 | 0.5817 |
Extraction | 0.3304 | 0.3408 | 0.3470 |
Knowledge & QA | 0.5587 | 0.5735 | 0.5685 |
MMLU_en | 0.3035 | 0.2351 | 0.2158 |
Semantic Analysis | 0.4220 | 0.5200 | 0.5070 |
Syntax Analysis | 0.4399 | 0.4967 | 0.4903 |
Note: The low MMLU_en scores are due to the model's inability to strictly follow the required answer format for this benchmark, rather than reflecting its actual knowledge or reasoning capabilities.
Alignment (ALT) Details
Subcategory | Int4 | Int8 | Original |
---|---|---|---|
Controllability | 0.6908 | 0.6949 | 0.6938 |
Ethics & Morality | 0.8800 | 0.9100 | 0.9000 |
Toxicity | 0.8143 | 0.8121 | 0.8007 |
Bias | 0.8858 | 0.8730 | 0.8650 |
Robustness | 0.3717 | 0.4208 | 0.4226 |
Truthfulness | 0.5292 | 0.4983 | 0.5206 |
Benchmark Scores
Benchmark | Int4 | Int8 | Original |
---|---|---|---|
JASTER (0-shot) | 0.3880 | 0.4262 | 0.4186 |
JASTER (2-shot) | 0.6136 | 0.6441 | 0.6398 |
MT-Bench | 8.2438 | 8.2000 | 8.1313 |
LCTG | 0.6860 | 0.6670 | 0.6750 |
Model Characteristics & Evaluation
- High Stability: Standard GPTQ quantization achieves sufficient performance for 14B class models
- Basic Tasks: Maintains high performance of 0.84+ in expression, translation, and information retrieval; MT-Bench scores largely maintain the original model's very high level for this model size
- Alignment: Particularly high scores in ethics, morality, and bias metrics
License
This model follows the license of its base model microsoft/phi-4. Please refer to the base model's license for details.