|
--- |
|
license: mit |
|
license_link: https://huggingface.co/microsoft/phi-4/resolve/main/LICENSE |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- phi |
|
- nlp |
|
- math |
|
- code |
|
- chat |
|
- conversational |
|
- phi3 |
|
inference: |
|
parameters: |
|
temperature: 0 |
|
widget: |
|
- messages: |
|
- role: user |
|
content: How many R's in strawberry? Think step by step. |
|
library_name: transformers |
|
datasets: |
|
- amphora/QwQ-LongCoT-130K |
|
base_model: |
|
- microsoft/phi-4 |
|
model-index: |
|
- name: SuperThoughts-CoT-14B-16k-o1-QwQ |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: wis-k/instruction-following-eval |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 5.15 |
|
name: averaged accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: SaylorTwift/bbh |
|
split: test |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 52.85 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: lighteval/MATH-Hard |
|
split: test |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 40.79 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 19.02 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 21.79 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 47.43 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
gguf/final version: https://huggingface.co/Pinkstack/PARM-V2-phi-4-16k-CoT-o1-gguf |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/QDHJhI0EVT_L9AHY_g3Br.png) |
|
|
|
[Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905) |
|
Phi-4 that has been tuned to be more advanced at reasoning. |
|
|
|
Unlike other Parm models we had to optimize our fine tuning process to ensure accuracy while still being able to release this model. **Training loss: 0.443800** |
|
|
|
the model uses this prompt format: (modified phi-4 prompt) |
|
``` |
|
{{ if .System }}<|system|> |
|
{{ .System }}<|im_end|> |
|
{{ end }}{{ if .Prompt }}<|user|> |
|
{{ .Prompt }}<|im_end|> |
|
{{ end }}<|assistant|>{{ .CoT }}<|CoT|> |
|
{{ .Response }}<|FinalAnswer|><|im_end|> |
|
``` |
|
It is recommended to use a system prompt like this one: |
|
``` |
|
You are a helpful ai assistant. Make sure to put your finalanswer at the end. |
|
``` |
|
|
|
# 🧀 Examples: |
|
(q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.) |
|
**example 1:** |
|
![example1](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/NoLJREYFU8LdMwynyLLMG.png) |
|
**example 2:** |
|
![2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/uboFipmS1ulfxeDgMBsBH.png) |
|
**example 3:** |
|
![example2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/c4h-nw0DPTrQgX-_tvBoT.png) |
|
**example 4:** |
|
![example1part1.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Dcd6-wbpDQuXoulHaqATo.png) |
|
![example1part2.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/CoBYmYiRt9Z4IDFoOwHxc.png) |
|
|
|
All generated locally and pretty quickly too! 😲 Due to our very limited resources we weren't able to evaluate this model (yet..) if you evaluate it please do let us know! |
|
|
|
# 🧀 Information |
|
- ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8! |
|
- ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> without providing a final answer at the end, you can ignore this or modify the prompt format. |
|
- this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens. |
|
|
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** Pinkstack |
|
- **License:** MIT |
|
- **Finetuned from model :** microsoft/phi-4 |
|
|
|
This phi-4 model was trained with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Pinkstack%2FSuperThoughts-CoT-14B-16k-o1-QwQ&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! |
|
|
|
| Metric |Value (%)| |
|
|-------------------|--------:| |
|
|**Average** | 31.17| |
|
|IFEval (0-Shot) | 5.15| |
|
|BBH (3-Shot) | 52.85| |
|
|MATH Lvl 5 (4-Shot)| 40.79| |
|
|GPQA (0-shot) | 19.02| |
|
|MuSR (0-shot) | 21.79| |
|
|MMLU-PRO (5-shot) | 47.43| |
|
|
|
|