File size: 2,228 Bytes
9455d70
 
4e020b8
9455d70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e020b8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
pipeline_tag: text-generation
library_name: transformers
---
# Phi-4
Note: This checkpoint is copied from [Azure AI Foundary](https://ai.azure.com/explore/models/Phi-4/version/1/registry/azureml?tid=1a57c1b4-f329-43bc-be9d-db002574ae97)

Phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

Phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

For more information, reference the Phi-4 [Technical Report](https://www.microsoft.com/en-us/research/uploads/prod/2024/12/P4TechReport.pdf).

## Model Architecture
Phi-4 is a 14B parameters, dense decoder-only transformer model.

## Training Data
Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from:
1. Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code.
2. Newly created synthetic, "textbook-like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.).
3. Acquired academic books and Q&A datasets.
4. High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.

Multilingual data constitutes about 8% of our overall data. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge.


## Data, media and languages

|Property|Description|
|---|---|
|Supported data types|Inputs: text, Outputs: text|
|Supported languages|en, ar, bn, cs, da, de, el, es, fa, fi, fr, gu, ha, he, hi, hu, id, it, ja, jv, kn, ko, ml, mr, nl, no, or, pa, pl, ps, pt, ro, ru, sv, sw, ta, te, th, tl, tr, uk, ur, vi, yo, zh|