# Results The results here are taken from running `score_predictions.py` from the [babylm evaluation pipeline](https://github.com/babylm/evaluation-pipeline-2024) on the `ELC_ParserBERT_10M_textonly_predictions.json.gz` file in this directory, which contains the predictions for the different evaluation tasks. ## Overall Results Here are the average results per section and the macroscore, compared with the baseline models: | Model | BLiMP | BLiMP Supplement | EWoK | GLUE | *Macroaverage* | | --- | --- | --- | --- | --- | --- | | BabyLlama | 69.8 | 59.5 | 50.7 | 63.3 | 60.8 | | LTG-BERT | 60.6 | 60.8 | 48.9 | 60.3 | 57.7 | | ELC-ParserBERT | 59.6 | 57.7 | 63.1 | 44.5 | 56.2 | ## The Breakdown Per Section |glue subtask | Score | |-------------- | ------- | |cola (MCC) | 0.042 | |sst2 | 0.502 | |mrpc (F1) | 0.82 | |qqp (F1) | 0 | |mnli | 0.357 | |mnli-mm | 0.355 | |qnli | 0.491 | |rte | 0.496 | |boolq | 0.585 | |multirc | 0.63 | |wsc | 0.615 | |*Average* | 0.445 | | blimp subtask | Score | | --------------------------------------------------- | ------- | | adjunct_island | 0.712 | | anaphor_gender_agreement | 0.593 | | anaphor_number_agreement | 0.647 | | animate_subject_passive | 0.594 | | animate_subject_trans | 0.47 | | causative | 0.726 | | complex_NP_island | 0.447 | | coordinate_structure_constraint_complex_left_branch | 0.39 | | coordinate_structure_constraint_object_extraction | 0.806 | | determiner_noun_agreement_1 | 0.793 | | determiner_noun_agreement_2 | 0.936 | | determiner_noun_agreement_irregular_1 | 0.467 | | determiner_noun_agreement_irregular_2 | 0.394 | | determiner_noun_agreement_with_adj_2 | 0.889 | | determiner_noun_agreement_with_adj_irregular_1 | 0.834 | | determiner_noun_agreement_with_adj_irregular_2 | 0.848 | | determiner_noun_agreement_with_adjective_1 | 0.758 | | distractor_agreement_relational_noun | 0.212 | | distractor_agreement_relative_clause | 0.282 | | drop_argument | 0.485 | | ellipsis_n_bar_1 | 0.505 | | ellipsis_n_bar_2 | 0.342 | | existential_there_object_raising | 0.447 | | existential_there_quantifiers_1 | 0.385 | | existential_there_quantifiers_2 | 0.396 | | existential_there_subject_raising | 0.476 | | expletive_it_object_raising | 0.44 | | inchoative | 0.527 | | intransitive | 0.484 | | irregular_past_participle_adjectives | 0.348 | | irregular_past_participle_verbs | 0.594 | | irregular_plural_subject_verb_agreement_1 | 0.634 | | irregular_plural_subject_verb_agreement_2 | 0.687 | | left_branch_island_echo_question | 0.634 | | left_branch_island_simple_question | 0.615 | | matrix_question_npi_licensor_present | 0.206 | | npi_present_1 | 0.362 | | npi_present_2 | 0.347 | | only_npi_licensor_present | 0.964 | | only_npi_scope | 0.89 | | passive_1 | 0.514 | | passive_2 | 0.482 | | principle_A_c_command | 0.635 | | principle_A_case_1 | 0.999 | | principle_A_case_2 | 0.78 | | principle_A_domain_1 | 0.893 | | principle_A_domain_2 | 0.623 | | principle_A_domain_3 | 0.556 | | principle_A_reconstruction | 0.339 | | regular_plural_subject_verb_agreement_1 | 0.628 | | regular_plural_subject_verb_agreement_2 | 0.663 | | sentential_negation_npi_licensor_present | 0.93 | | sentential_negation_npi_scope | 0.722 | | sentential_subject_island | 0.361 | | superlative_quantifiers_1 | 0.702 | | superlative_quantifiers_2 | 0.498 | | tough_vs_raising_1 | 0.351 | | tough_vs_raising_2 | 0.648 | | transitive | 0.645 | | wh_island | 0.719 | | wh_questions_object_gap | 0.657 | | wh_questions_subject_gap | 0.861 | | wh_questions_subject_gap_long_distance | 0.937 | | wh_vs_that_no_gap | 0.969 | | wh_vs_that_no_gap_long_distance | 0.969 | | wh_vs_that_with_gap | 0.222 | | wh_vs_that_with_gap_long_distance | 0.063 | | *Average* | 0.596 | | blimp_supplement subtask | Score | | -------------------------- | ------- | | hypernym | 0.531 | | qa_congruence_easy | 0.641 | | qa_congruence_tricky | 0.521 | | subject_aux_inversion | 0.614 | | turn_taking | 0.579 | | *Average* | 0.577 | | ewok subtask | Score | | ----------------------- | ------- | | agent-properties | 0.738 | | material-dynamics | 0.81 | | material-properties | 0.6 | | physical-dynamics | 0.383 | | physical-interactions | 0.599 | | physical-relations | 0.817 | | quantitative-properties | 0.427 | | social-interactions | 0.565 | | social-properties | 0.561 | | social-relations | 0.807 | | spatial-relations | 0.635 | | *Average* | 0.631 |