Model AGIEval GPT4All TruthfulQA Bigbench Average
Lelantos-DPO-7B 45.47 75 67.05 46.64 58.54
Lelantos-7B 46.01 75 64.93 46.21 58.04

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.20 ± 2.73
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 40.71 ± 1.93
acc_norm 40.25 ± 1.92
agieval_lsat_ar 0 acc 24.35 ± 2.84
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 55.69 ± 2.20
acc_norm 55.49 ± 2.20
agieval_lsat_rc 0 acc 65.06 ± 2.91
acc_norm 65.43 ± 2.91
agieval_sat_en 0 acc 76.70 ± 2.95
acc_norm 76.70 ± 2.95
agieval_sat_en_without_passage 0 acc 47.09 ± 3.49
acc_norm 45.63 ± 3.48
agieval_sat_math 0 acc 36.36 ± 3.25
acc_norm 33.18 ± 3.18

Average: 45.47%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 62.12 ± 1.42
acc_norm 63.23 ± 1.41
arc_easy 0 acc 85.40 ± 0.72
acc_norm 81.02 ± 0.80
boolq 1 acc 87.25 ± 0.58
hellaswag 0 acc 67.97 ± 0.47
acc_norm 85.48 ± 0.35
openbookqa 0 acc 36.80 ± 2.16
acc_norm 47.20 ± 2.23
piqa 0 acc 81.88 ± 0.90
acc_norm 83.57 ± 0.86
winogrande 0 acc 77.27 ± 1.18

Average: 75.0%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 49.94 ± 1.75
mc2 67.05 ± 1.53

Average: 67.05%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 58.95 ± 3.58
bigbench_date_understanding 0 multiple_choice_grade 64.23 ± 2.50
bigbench_disambiguation_qa 0 multiple_choice_grade 36.43 ± 3.00
bigbench_geometric_shapes 0 multiple_choice_grade 23.68 ± 2.25
exact_str_match 3.90 ± 1.02
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 33.40 ± 2.11
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 24.43 ± 1.63
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 54.33 ± 2.88
bigbench_movie_recommendation 0 multiple_choice_grade 52.20 ± 2.24
bigbench_navigate 0 multiple_choice_grade 52.70 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 69.65 ± 1.03
bigbench_ruin_names 0 multiple_choice_grade 50.22 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 40.98 ± 1.56
bigbench_snarks 0 multiple_choice_grade 72.38 ± 3.33
bigbench_sports_understanding 0 multiple_choice_grade 73.23 ± 1.41
bigbench_temporal_sequences 0 multiple_choice_grade 39.90 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 20.88 ± 1.15
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.60 ± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 54.33 ± 2.88

Average: 46.64%

Average score: 58.54%

Downloads last month
786
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SanjiWatsuki/Lelantos-DPO-7B

Merges
3 models
Quantizations
1 model

Spaces using SanjiWatsuki/Lelantos-DPO-7B 13