|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: prithivMLmods/Phi-4-o1 |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- chain-of-thought |
|
- phi3 |
|
- phi |
|
- math |
|
- code |
|
- custom_code |
|
- text-generation-inference |
|
- phi-4 |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Phi-4-o1-Q5_K_M-GGUF |
|
This model was converted to GGUF format from [`prithivMLmods/Phi-4-o1`](https://huggingface.co/prithivMLmods/Phi-4-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/prithivMLmods/Phi-4-o1) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
[Phi-4 O1 finetuned] from Microsoft's Phi-4 is a state-of-the-art |
|
open model built upon a blend of synthetic datasets, data from filtered |
|
public domain websites, and acquired academic books and Q&A |
|
datasets. The goal of this approach is to ensure that small, capable |
|
models are trained with high-quality data focused on advanced reasoning. |
|
|
|
|
|
phi-4 has adopted a robust safety post-training approach. This |
|
approach leverages a variety of both open-source and in-house generated |
|
synthetic datasets. The overall technique employed to do the safety |
|
alignment is a combination of SFT (Supervised Fine-Tuning) and iterative |
|
DPO (Direct Preference Optimization), including publicly available |
|
datasets focusing on helpfulness and harmlessness as well as various |
|
questions and answers targeted at multiple safety categories. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dataset Info |
|
|
|
|
|
|
|
|
|
Phi-4 o1 ft is fine-tuned on a synthetic dataset curated through a |
|
pipeline explicitly built for this purpose. The data is primarily based |
|
on the Chain of Thought (CoT) or Chain of Continuous Thought (COCONUT) |
|
methodologies. This approach ensures that the dataset is rich in |
|
reasoning, problem-solving, and step-by-step breakdowns of complex |
|
tasks. The model is specifically designed to excel in reasoning, |
|
mathematics, and breaking down problems into logical, manageable steps. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Run with Transformers |
|
|
|
|
|
|
|
|
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Phi-4-o1") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"prithivMLmods/Phi-4-o1", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
) |
|
|
|
input_text = "Write me a poem about Machine Learning." |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids, max_new_tokens=32) |
|
print(tokenizer.decode(outputs[0])) |
|
|
|
|
|
|
|
You can ensure the correct chat template is applied by using tokenizer.apply_chat_template as follows: |
|
|
|
|
|
messages = [ |
|
{"role": "user", "content": "Write me a poem about Machine Learning."}, |
|
] |
|
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda") |
|
|
|
outputs = model.generate(**input_ids, max_new_tokens=256) |
|
print(tokenizer.decode(outputs[0])) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Intended Use |
|
|
|
|
|
|
|
|
|
The phi-4 o1 ft model is designed for a wide range of applications, |
|
particularly those requiring advanced reasoning, high-quality text |
|
generation, and multilingual capabilities. Below are some of the |
|
intended use cases: |
|
|
|
|
|
Complex Reasoning Tasks: |
|
|
|
|
|
Solving intricate problems in mathematics, logic, and science. |
|
Assisting in academic research by providing detailed explanations and summaries. |
|
|
|
|
|
Multilingual Applications: |
|
|
|
|
|
Translating text across multiple languages while preserving context and nuance. |
|
Generating content in various languages for global audiences. |
|
|
|
|
|
Content Creation: |
|
|
|
|
|
Assisting writers, marketers, and creators with high-quality text generation. |
|
Generating creative ideas, stories, and technical documentation. |
|
|
|
|
|
Educational Tools: |
|
|
|
|
|
Providing explanations, tutoring, and Q&A support for students and educators. |
|
Generating practice questions and answers for learning purposes. |
|
|
|
|
|
Customer Support: |
|
|
|
|
|
Automating responses to customer queries with accurate and helpful information. |
|
Handling complex customer service scenarios with advanced reasoning. |
|
|
|
|
|
Safety-Critical Applications: |
|
|
|
|
|
Ensuring responses are aligned with safety guidelines, making it suitable for sensitive domains. |
|
Providing harmlessness-focused interactions in public-facing applications. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Limitations |
|
|
|
|
|
|
|
|
|
While phi-4 o1 ft is a powerful and versatile model, it has certain limitations that users should be aware of: |
|
|
|
|
|
Bias and Fairness: |
|
|
|
|
|
Despite rigorous training and safety alignment, the model may still |
|
exhibit biases present in the training data. Users should critically |
|
evaluate outputs, especially in sensitive contexts. |
|
|
|
|
|
Contextual Understanding: |
|
|
|
|
|
The model may occasionally misinterpret complex or ambiguous prompts, leading to inaccurate or irrelevant responses. |
|
|
|
|
|
Real-Time Knowledge: |
|
|
|
|
|
The model's knowledge is limited to the data it was trained on and |
|
does not include real-time or post-training updates. It may not be aware |
|
of recent events or developments. |
|
|
|
|
|
Safety and Harmlessness: |
|
|
|
|
|
While extensive efforts have been made to align the model with |
|
safety guidelines, it may still generate outputs that are inappropriate |
|
or harmful in certain contexts. Continuous monitoring and human |
|
oversight are recommended. |
|
|
|
|
|
Resource Requirements: |
|
|
|
|
|
Running the model efficiently may require significant computational |
|
resources, especially for large-scale or real-time applications. |
|
|
|
|
|
Ethical Considerations: |
|
|
|
|
|
The model should not be used for malicious purposes, such as |
|
generating harmful content, misinformation, or spam. Users are |
|
responsible for ensuring ethical use. |
|
|
|
|
|
Domain-Specific Limitations: |
|
|
|
|
|
While the model performs well on general-purpose tasks, it may lack |
|
depth in highly specialized domains (e.g., medical, legal, or financial |
|
fields) without additional fine-tuning. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Phi-4-o1-Q5_K_M-GGUF --hf-file phi-4-o1-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Phi-4-o1-Q5_K_M-GGUF --hf-file phi-4-o1-q5_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Phi-4-o1-Q5_K_M-GGUF --hf-file phi-4-o1-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Phi-4-o1-Q5_K_M-GGUF --hf-file phi-4-o1-q5_k_m.gguf -c 2048 |
|
``` |
|
|