Evaluation

Tasks Version Filter n-shot Metric Value Stderr
- medmcqa Yaml none 0 acc 0.5408 ± 0.0077
none 0 acc_norm 0.5408 ± 0.0077
- medqa_4options Yaml none 0 acc 0.5711 ± 0.0139
none 0 acc_norm 0.5711 ± 0.0139
- anatomy (mmlu) 0 none 0 acc 0.6815 ± 0.0402
- clinical_knowledge (mmlu) 0 none 0 acc 0.7434 ± 0.0269
- college_biology (mmlu) 0 none 0 acc 0.8056 ± 0.0331
- college_medicine (mmlu) 0 none 0 acc 0.6647 ± 0.0360
- medical_genetics (mmlu) 0 none 0 acc 0.7300 ± 0.0446
- professional_medicine (mmlu) 0 none 0 acc 0.7353 ± 0.0268
stem N/A none 0 acc_norm 0.5478 ± 0.0067
none 0 acc 0.5909 ± 0.0058
- pubmedqa 1 none 0 acc 0.7620 ± 0.0191
Groups Version Filter n-shot Metric Value Stderr
stem N/A none 0 acc_norm 0.5478 ± 0.0067
none 0 acc 0.5909 ± 0.0058

Comparison Image

Downloads last month
336
Safetensors
Model size
3.82B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.