Adding Evaluation Results (#1)

6d9a187 verified 7 days ago

7.48 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- gguf
	base_model: unsloth/phi-4-unsloth-bnb-4bit
	datasets:
	- bespokelabs/Bespoke-Stratos-17k
	- bespokelabs/Bespoke-Stratos-35k
	- NovaSky-AI/Sky-T1_data_17k
	- Quazim0t0/BenfordsLawReasoningJSON
	- open-thoughts/OpenThoughts-114k
	model-index:
	- name: Phi4.Turn.R1Distill_v1.5.1-Tensors
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 29.95
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 49.22
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 1.59
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 2.46
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 7.04
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 45.75
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Quazim0t0/Phi4.Turn.R1Distill_v1.5.1-Tensors
	name: Open LLM Leaderboard
	---

	# TurnPhi Project

	- Developed by: Quazim0t0
	- Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit
	- GGUF
	- Trained for 8 Hours on A800 with the Bespoke Stratos 17k Dataset.
	- Trained for 6 Hours on A800 with the Bespoke Stratos 35k Dataset.
	- Trained for 2 Hours on A800 with the Benford's Law Reasoning Small 430 Row Dataset, ensuring no overfitting.
	- Trained for 4 Hours on A800 with the Sky-T1_data_17k Dataset
	- Trained for 6 Hours on A800 with the Openthoughts 114k Dataset.
	- 18$ Training...I'm actually amazed by the results.

	# OpenWeb UI Function
	If using this model for Open WebUI here is a simple function to organize the models responses: https://openwebui.com/f/quaz93/phi4_turn_r1_distill_thought_function_v1

	# Phi4 Turn R1Distill LoRA Adapters

	## Overview
	These LoRA adapters were trained using diverse reasoning datasets that incorporate structured Thought and Solution responses to enhance logical inference. This project was designed to test the R1 dataset on Phi-4, aiming to create a lightweight, fast, and efficient reasoning model.

	All adapters were fine-tuned using an NVIDIA A800 GPU, ensuring high performance and compatibility for continued training, merging, or direct deployment.
	As part of an open-source initiative, all resources are made publicly available for unrestricted research and development.

	---

	## LoRA Adapters
	Below are the currently available LoRA fine-tuned adapters (as of January 30, 2025):

	- [Phi4.Turn.R1Distill-Lora1](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora1)
	- [Phi4.Turn.R1Distill-Lora2](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora2)
	- [Phi4.Turn.R1Distill-Lora3](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora3)
	- [Phi4.Turn.R1Distill-Lora4](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora4)
	- [Phi4.Turn.R1Distill-Lora5](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora5)
	- [Phi4.Turn.R1Distill-Lora6](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora6)
	- [Phi4.Turn.R1Distill-Lora7](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora7)
	- [Phi4.Turn.R1Distill-Lora8](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill-Lora8)

	---

	## GGUF Full & Quantized Models
	To facilitate broader testing and real-world inference, GGUF Full and Quantized versions have been provided for evaluation on Open WebUI and other LLM interfaces.

	### Version 1
	- [Phi4.Turn.R1Distill.Q8_0](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill.Q8_0)
	- [Phi4.Turn.R1Distill.Q4_k](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill.Q4_k)
	- [Phi4.Turn.R1Distill.16bit](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill.16bit)

	### Version 1.1
	- [Phi4.Turn.R1Distill_v1.1_Q4_k](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill_v1.1_Q4_k)

	### Version 1.2
	- [Phi4.Turn.R1Distill_v1.2_Q4_k](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill_v1.2_Q4_k)

	### Version 1.3
	- [Phi4.Turn.R1Distill_v1.3_Q4_k-GGUF](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill_v1.3_Q4_k-GGUF)

	### Version 1.4
	- [Phi4.Turn.R1Distill_v1.4_Q4_k-GGUF](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill_v1.4_Q4_k-GGUF)

	### Version 1.5
	- [Phi4.Turn.R1Distill_v1.5_Q4_k-GGUF](https://huggingface.co/Quazim0t0/Phi4.Turn.R1Distill_v1.5_Q4_k-GGUF)

	---

	## Usage

	### Loading LoRA Adapters with `transformers` and `peft`
	To load and apply the LoRA adapters on Phi-4, use the following approach:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model = "microsoft/Phi-4"
	lora_adapter = "Quazim0t0/Phi4.Turn.R1Distill-Lora1"

	tokenizer = AutoTokenizer.from_pretrained(base_model)
	model = AutoModelForCausalLM.from_pretrained(base_model)
	model = PeftModel.from_pretrained(model, lora_adapter)

	model.eval()
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Quazim0t0__Phi4.Turn.R1Distill_v1.5.1-Tensors-details)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|22.67\|
	\|IFEval (0-Shot) \|29.95\|
	\|BBH (3-Shot) \|49.22\|
	\|MATH Lvl 5 (4-Shot)\| 1.59\|
	\|GPQA (0-shot) \| 2.46\|
	\|MuSR (0-shot) \| 7.04\|
	\|MMLU-PRO (5-shot) \|45.75\|