macadeliccc
/

SOLAR-10.7b-Instruct-truthy-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SOLAR-10.7b-Instruct-truthy-dpo

This model is a finetune of macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Process

I finetuned upstageai/Solar-10.7b-Instruct-v0.1 with 1 epoch of Intel/orca_dpo_pairs (12.4k samples)
I futher finetuned that model with 3 epochs of jondurbin/truthy-dpo-v0.1 (1.04k samples)
This process is experimental and the base model linked above is more tested at this time.

GGUF

Available here

Evaluations

----Benchmark Complete---- + 2024-01-26 20:57:38 + Time taken: 25.4 mins + Prompt Format: ChatML + Model: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo-GGUF + Score (v2): 74.11 + Parseable: 171.0

Batch completed Time taken: 25.5 mins

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
SOLAR-10.7b-Instruct-truthy-dpo	48.69	73.82	76.81	45.71	61.26

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	27.95	±	2.82
		acc_norm	27.95	±	2.82
agieval_logiqa_en	0	acc	42.40	±	1.94
		acc_norm	42.24	±	1.94
agieval_lsat_ar	0	acc	25.65	±	2.89
		acc_norm	23.91	±	2.82
agieval_lsat_lr	0	acc	54.12	±	2.21
		acc_norm	54.51	±	2.21
agieval_lsat_rc	0	acc	69.89	±	2.80
		acc_norm	69.89	±	2.80
agieval_sat_en	0	acc	80.10	±	2.79
		acc_norm	80.10	±	2.79
agieval_sat_en_without_passage	0	acc	50.00	±	3.49
		acc_norm	49.51	±	3.49
agieval_sat_math	0	acc	42.27	±	3.34
		acc_norm	41.36	±	3.33

Average: 48.69%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	59.90	±	1.43
		acc_norm	63.91	±	1.40
arc_easy	0	acc	80.85	±	0.81
		acc_norm	78.16	±	0.85
boolq	1	acc	88.20	±	0.56
hellaswag	0	acc	68.34	±	0.46
		acc_norm	86.39	±	0.34
openbookqa	0	acc	37.60	±	2.17
		acc_norm	46.80	±	2.23
piqa	0	acc	78.84	±	0.95
		acc_norm	78.78	±	0.95
winogrande	0	acc	74.51	±	1.22

Average: 73.82%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	61.81	±	1.70
		mc2	76.81	±	1.42

Average: 76.81%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	50.53	±	3.64
bigbench_date_understanding	0	multiple_choice_grade	63.14	±	2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	47.67	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	26.18	±	2.32
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	28.60	±	2.02
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	21.29	±	1.55
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	47.33	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	39.80	±	2.19
bigbench_navigate	0	multiple_choice_grade	63.80	±	1.52
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	59.05	±	1.10
bigbench_ruin_names	0	multiple_choice_grade	40.18	±	2.32
bigbench_salient_translation_error_detection	0	multiple_choice_grade	46.69	±	1.58
bigbench_snarks	0	multiple_choice_grade	65.19	±	3.55
bigbench_sports_understanding	0	multiple_choice_grade	72.41	±	1.42
bigbench_temporal_sequences	0	multiple_choice_grade	60.30	±	1.55
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	25.76	±	1.24
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.43	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	47.33	±	2.89

Average: 45.71%

Average score: 61.26%

Elapsed time: 02:16:03

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	74.11
AI2 Reasoning Challenge (25-Shot)	72.10
HellaSwag (10-Shot)	88.44
MMLU (5-Shot)	65.45
TruthfulQA (0-shot)	76.75
Winogrande (5-shot)	82.72
GSM8k (5-shot)	59.21

Downloads last month: 147

Safetensors

Model size

10.7B params

Tensor type

FP16

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Merges

1 model

Quantizations

Spaces using macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo 5

Collection including macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

DPO fine tunes

3 items • Updated Jul 11

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

72.100
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

88.440
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.450
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

76.750
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

82.720
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

59.210

View on Papers With Code