Phi-2-psy

Phi-2-psy is a merge of the following models:

πŸ† Evaluation

The evaluation was performed using LLM AutoEval on Nous suite.

Model AGIEval GPT4All TruthfulQA Bigbench Average
phi-2-psy 34.4 71.4 48.2 38.1 48.02
phixtral-2x2_8 34.1 70.4 48.8 37.8 47.78
dolphin-2_6-phi-2 33.1 69.9 47.4 37.2 46.89
phi-2-orange 33.4 71.3 49.9 37.3 47.97
phi-2 28.0 70.8 44.4 35.2 44.61

🧩 Configuration

slices:
  - sources:
      - model: rhysjones/phi-2-orange
        layer_range: [0, 32]
      - model: cognitivecomputations/dolphin-2_6-phi-2
        layer_range: [0, 32]
merge_method: slerp
base_model: rhysjones/phi-2-orange
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

πŸ’» Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("vince62s/phi-2-psy", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("vince62s/phi-2-psy", trust_remote_code=True)
inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 62.80
AI2 Reasoning Challenge (25-Shot) 60.84
HellaSwag (10-Shot) 75.52
MMLU (5-Shot) 57.57
TruthfulQA (0-shot) 48.22
Winogrande (5-shot) 75.45
GSM8k (5-shot) 59.21
Downloads last month
81
Safetensors
Model size
2.78B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vince62s/phi-2-psy

Evaluation results