Model Card for Kurtis-E1.1-Qwen2.5-3B-Instruct

Kurtis E1.1 fine-tuned with flower

Eval Results

Evaluation tasks were performed with the LM Evaluation Harness on an NVIDIA A40.

hellaswag

lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks hellaswag --device cuda:0 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc ↑ 0.5555 ± 0.0050
none 0 acc_norm ↑ 0.7412 ± 0.0044

arc_easy

lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks arc_easy --device cuda:0 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc ↑ 0.7710 ± 0.0086
none 0 acc_norm ↑ 0.6789 ± 0.0096

arc_challenge

lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks arc_challenge --device cuda:0 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc ↑ 0.436 ± 0.0145
none 0 acc_norm ↑ 0.448 ± 0.0145

mmlu

lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks mmlu --device cuda:0 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6522 ± 0.0038
- humanities 2 none acc ↑ 0.5734 ± 0.0066
- formal_logic 1 none 0 acc ↑ 0.4603 ± 0.0446
- high_school_european_history 1 none 0 acc ↑ 0.7939 ± 0.0316
- high_school_us_history 1 none 0 acc ↑ 0.8333 ± 0.0262
- high_school_world_history 1 none 0 acc ↑ 0.8397 ± 0.0239
- international_law 1 none 0 acc ↑ 0.7769 ± 0.0380
- jurisprudence 1 none 0 acc ↑ 0.7963 ± 0.0389
- logical_fallacies 1 none 0 acc ↑ 0.7975 ± 0.0316
- moral_disputes 1 none 0 acc ↑ 0.6850 ± 0.0250
- moral_scenarios 1 none 0 acc ↑ 0.2905 ± 0.0152
- philosophy 1 none 0 acc ↑ 0.7106 ± 0.0258
- prehistory 1 none 0 acc ↑ 0.7438 ± 0.0243
- professional_law 1 none 0 acc ↑ 0.4759 ± 0.0128
- world_religions 1 none 0 acc ↑ 0.8246 ± 0.0292
- other 2 none acc ↑ 0.7087 ± 0.0079
- business_ethics 1 none 0 acc ↑ 0.7300 ± 0.0446
- clinical_knowledge 1 none 0 acc ↑ 0.7321 ± 0.0273
- college_medicine 1 none 0 acc ↑ 0.6705 ± 0.0358
- global_facts 1 none 0 acc ↑ 0.3900 ± 0.0490
- human_aging 1 none 0 acc ↑ 0.7130 ± 0.0304
- management 1 none 0 acc ↑ 0.7961 ± 0.0399
- marketing 1 none 0 acc ↑ 0.8803 ± 0.0213
- medical_genetics 1 none 0 acc ↑ 0.7600 ± 0.0429
- miscellaneous 1 none 0 acc ↑ 0.7957 ± 0.0144
- nutrition 1 none 0 acc ↑ 0.7353 ± 0.0253
- professional_accounting 1 none 0 acc ↑ 0.5426 ± 0.0297
- professional_medicine 1 none 0 acc ↑ 0.6434 ± 0.0291
- virology 1 none 0 acc ↑ 0.4880 ± 0.0389
- social sciences 2 none acc ↑ 0.7618 ± 0.0076
- econometrics 1 none 0 acc ↑ 0.5439 ± 0.0469
- high_school_geography 1 none 0 acc ↑ 0.7677 ± 0.0301
- high_school_government_and_politics 1 none 0 acc ↑ 0.8860 ± 0.0229
- high_school_macroeconomics 1 none 0 acc ↑ 0.6949 ± 0.0233
- high_school_microeconomics 1 none 0 acc ↑ 0.7773 ± 0.0270
- high_school_psychology 1 none 0 acc ↑ 0.8477 ± 0.0154
- human_sexuality 1 none 0 acc ↑ 0.7786 ± 0.0364
- professional_psychology 1 none 0 acc ↑ 0.7075 ± 0.0184
- public_relations 1 none 0 acc ↑ 0.6818 ± 0.0446
- security_studies 1 none 0 acc ↑ 0.7224 ± 0.0287
- sociology 1 none 0 acc ↑ 0.8458 ± 0.0255
- us_foreign_policy 1 none 0 acc ↑ 0.8400 ± 0.0368
- stem 2 none acc ↑ 0.6070 ± 0.0085
- abstract_algebra 1 none 0 acc ↑ 0.4700 ± 0.0502
- anatomy 1 none 0 acc ↑ 0.6667 ± 0.0407
- astronomy 1 none 0 acc ↑ 0.6776 ± 0.0380
- college_biology 1 none 0 acc ↑ 0.7222 ± 0.0375
- college_chemistry 1 none 0 acc ↑ 0.5000 ± 0.0503
- college_computer_science 1 none 0 acc ↑ 0.6000 ± 0.0492
- college_mathematics 1 none 0 acc ↑ 0.3400 ± 0.0476
- college_physics 1 none 0 acc ↑ 0.4902 ± 0.0497
- computer_security 1 none 0 acc ↑ 0.7000 ± 0.0461
- conceptual_physics 1 none 0 acc ↑ 0.6468 ± 0.0312
- electrical_engineering 1 none 0 acc ↑ 0.6690 ± 0.0392
- elementary_mathematics 1 none 0 acc ↑ 0.5979 ± 0.0253
- high_school_biology 1 none 0 acc ↑ 0.8129 ± 0.0222
- high_school_chemistry 1 none 0 acc ↑ 0.5813 ± 0.0347
- high_school_computer_science 1 none 0 acc ↑ 0.7800 ± 0.0416
- high_school_mathematics 1 none 0 acc ↑ 0.5037 ± 0.0305
- high_school_physics 1 none 0 acc ↑ 0.4437 ± 0.0406
- high_school_statistics 1 none 0 acc ↑ 0.5972 ± 0.0334
- machine_learning 1 none 0 acc ↑ 0.4554 ± 0.0473
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6522 ± 0.0038
- humanities 2 none acc ↑ 0.5734 ± 0.0066
- other 2 none acc ↑ 0.7087 ± 0.0079
- social sciences 2 none acc ↑ 0.7618 ± 0.0076
- stem 2 none acc ↑ 0.6070 ± 0.0085

mmlu (5-shot)

lm_eval   --model hf   --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct   --tasks mmlu   --device cuda:0   --batch_size 8 --num_fewshot 5
Tasks Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6629 ± 0.0038
- humanities 2 none acc ↑ 0.5862 ± 0.0067
- formal_logic 1 none 5 acc ↑ 0.4683 ± 0.0446
- high_school_european_history 1 none 5 acc ↑ 0.7818 ± 0.0323
- high_school_us_history 1 none 5 acc ↑ 0.8284 ± 0.0265
- high_school_world_history 1 none 5 acc ↑ 0.8692 ± 0.0219
- international_law 1 none 5 acc ↑ 0.7769 ± 0.0380
- jurisprudence 1 none 5 acc ↑ 0.7963 ± 0.0389
- logical_fallacies 1 none 5 acc ↑ 0.8098 ± 0.0308
- moral_disputes 1 none 5 acc ↑ 0.7110 ± 0.0244
- moral_scenarios 1 none 5 acc ↑ 0.3464 ± 0.0159
- philosophy 1 none 5 acc ↑ 0.7042 ± 0.0259
- prehistory 1 none 5 acc ↑ 0.7284 ± 0.0247
- professional_law 1 none 5 acc ↑ 0.4759 ± 0.0128
- world_religions 1 none 5 acc ↑ 0.8304 ± 0.0288
- other 2 none acc ↑ 0.7171 ± 0.0078
- business_ethics 1 none 5 acc ↑ 0.7400 ± 0.0441
- clinical_knowledge 1 none 5 acc ↑ 0.7321 ± 0.0273
- college_medicine 1 none 5 acc ↑ 0.6647 ± 0.0360
- global_facts 1 none 5 acc ↑ 0.4100 ± 0.0494
- human_aging 1 none 5 acc ↑ 0.7220 ± 0.0301
- management 1 none 5 acc ↑ 0.7864 ± 0.0406
- marketing 1 none 5 acc ↑ 0.8889 ± 0.0206
- medical_genetics 1 none 5 acc ↑ 0.7900 ± 0.0409
- miscellaneous 1 none 5 acc ↑ 0.7957 ± 0.0144
- nutrition 1 none 5 acc ↑ 0.7680 ± 0.0242
- professional_accounting 1 none 5 acc ↑ 0.5532 ± 0.0297
- professional_medicine 1 none 5 acc ↑ 0.6471 ± 0.0290
- virology 1 none 5 acc ↑ 0.5120 ± 0.0389
- social sciences 2 none acc ↑ 0.7735 ± 0.0075
- econometrics 1 none 5 acc ↑ 0.5877 ± 0.0463
- high_school_geography 1 none 5 acc ↑ 0.7828 ± 0.0294
- high_school_government_and_politics 1 none 5 acc ↑ 0.8756 ± 0.0238
- high_school_macroeconomics 1 none 5 acc ↑ 0.7051 ± 0.0231
- high_school_microeconomics 1 none 5 acc ↑ 0.7773 ± 0.0270
- high_school_psychology 1 none 5 acc ↑ 0.8550 ± 0.0151
- human_sexuality 1 none 5 acc ↑ 0.8092 ± 0.0345
- professional_psychology 1 none 5 acc ↑ 0.7288 ± 0.0180
- public_relations 1 none 5 acc ↑ 0.6909 ± 0.0443
- security_studies 1 none 5 acc ↑ 0.7551 ± 0.0275
- sociology 1 none 5 acc ↑ 0.8308 ± 0.0265
- us_foreign_policy 1 none 5 acc ↑ 0.8300 ± 0.0378
- stem 2 none acc ↑ 0.6159 ± 0.0084
- abstract_algebra 1 none 5 acc ↑ 0.5000 ± 0.0503
- anatomy 1 none 5 acc ↑ 0.6222 ± 0.0419
- astronomy 1 none 5 acc ↑ 0.7500 ± 0.0352
- college_biology 1 none 5 acc ↑ 0.7083 ± 0.0380
- college_chemistry 1 none 5 acc ↑ 0.4700 ± 0.0502
- college_computer_science 1 none 5 acc ↑ 0.6200 ± 0.0488
- college_mathematics 1 none 5 acc ↑ 0.4000 ± 0.0492
- college_physics 1 none 5 acc ↑ 0.4902 ± 0.0497
- computer_security 1 none 5 acc ↑ 0.8200 ± 0.0386
- conceptual_physics 1 none 5 acc ↑ 0.6383 ± 0.0314
- electrical_engineering 1 none 5 acc ↑ 0.6483 ± 0.0398
- elementary_mathematics 1 none 5 acc ↑ 0.5820 ± 0.0254
- high_school_biology 1 none 5 acc ↑ 0.8161 ± 0.0220
- high_school_chemistry 1 none 5 acc ↑ 0.6059 ± 0.0344
- high_school_computer_science 1 none 5 acc ↑ 0.7500 ± 0.0435
- high_school_mathematics 1 none 5 acc ↑ 0.4926 ± 0.0305
- high_school_physics 1 none 5 acc ↑ 0.4702 ± 0.0408
- high_school_statistics 1 none 5 acc ↑ 0.6343 ± 0.0328
- machine_learning 1 none 5 acc ↑ 0.4911 ± 0.0475
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.6629 ± 0.0038
- humanities 2 none acc ↑ 0.5862 ± 0.0067
- other 2 none acc ↑ 0.7171 ± 0.0078
- social sciences 2 none acc ↑ 0.7735 ± 0.0075
- stem 2 none acc ↑ 0.6159 ± 0.0084
Downloads last month
623
Safetensors
Model size
3.09B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct

Base model

Qwen/Qwen2.5-3B
Finetuned
(431)
this model
Quantizations
2 models

Dataset used to train ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct

Space using ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct 1

Collection including ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct