merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using prithivMLmods/Phi-4-QwQ as a base.

Models Merged

The following models were included in the merge:

Eval

image/png

IFEval is broken due to the Sky-T1 strict system prompt format, but other than that, seems to have recreated qwq at 14B.

Running

  • With Ollama
ollama run hf.co/benhaotang/phi4-qwq-sky-t1-Q4_K_M-GGUF

I suggest adding SYSTEM "You are a helpful AI asistent. You always think step by step." to triger step by step reasoning.

  • With pytorch
import transformers
tokenizer = AutoTokenizer.from_pretrained("mircosoft/phi-4")
pipeline = transformers.pipeline(
    "text-generation",
    model="benhaotang/phi4-qwq-sky-t1",
    tokenizer=tokenizer,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful AI asistent. You always think step by step."},
    {"role": "user", "content": "Give me a short intodcution to renormalization group(RG) flow in physcis?"},
]
outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"])

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: prithivMLmods/Phi-4-QwQ
    #no parameters necessary for base model
  - model: benhaotang/Phi-4-llama-t1-full
    parameters:
      density: 0.5
      weight: 0.5
  - model: prithivMLmods/Phi-4-QwQ
    parameters:
      density: 0.5
      weight: 0.5

merge_method: ties
base_model: prithivMLmods/Phi-4-QwQ
parameters:
  normalize: false
  int8_mask: true
dtype: float16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here! Summarized results can be found here!

Metric Value (%)
Average 30.78
IFEval (0-Shot) 4.60
BBH (3-Shot) 52.61
MATH Lvl 5 (4-Shot) 39.58
GPQA (0-shot) 19.35
MuSR (0-shot) 21.38
MMLU-PRO (5-shot) 47.16
Downloads last month
44
Safetensors
Model size
14.7B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for benhaotang/phi4-qwq-sky-t1

Dataset used to train benhaotang/phi4-qwq-sky-t1

Space using benhaotang/phi4-qwq-sky-t1 1

Evaluation results