--- library_name: transformers tags: [] --- # Model Card for STEP This model is pre-trained to perform (random) syntactic transformations of English sentences. The prefix given to the model decides, which syntactic transformation to apply. See [Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations](https://arxiv.org/abs/2407.04543) for full details. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** Matthias Lindemann - **Funded by [optional]:** UKRI, Huawei, Dutch National Science Foundation - **Model type:** Sequence-to-Sequence model - **Language(s) (NLP):** English - **License:** [More Information Needed] - **Finetuned from model:** T5-Base ### Model Sources [optional] - **Repository:** https://github.com/namednil/step - **Paper:** [Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations](https://arxiv.org/abs/2407.04543) ## Uses Syntax-sensitive sequence-to-sequence for English such as passivization, semantic parsing, question formation, ... ### Direct Use This model needs to be fine-tuned as it implements random syntactic transformations. ## Bias, Risks, and Limitations The model was exposed to the C4 corpus (pre-training data of T5) and is based on T5 and hence likely inherits biases from that. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## Model Examination [optional] We identified the following interpretable transformation look-up heads (see paper for details) for UD relations (in the format (layer, head) both with 0-based indexing): ```python {'cop': [(0, 3), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)], 'expl': [(0, 7), (7, 11), (8, 2), (8, 11), (9, 6), (9, 7), (11, 11)], 'amod': [(4, 6), (6, 6), (7, 11), (8, 0), (8, 11), (9, 5), (11, 11)], 'compound': [(4, 6), (6, 6), (7, 6), (7, 11), (8, 11), (9, 5), (9, 7), (9, 11), (11, 11)], 'det': [(4, 6), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5)], 'nmod:poss': [(4, 6), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (11, 11)], 'advmod': [(4, 11), (6, 6), (7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)], 'aux': [(4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)], 'mark': [(4, 11), (8, 11), (9, 5), (9, 6), (11, 11)], 'fixed': [(5, 5), (8, 2), (8, 6), (9, 4), (9, 6), (10, 1), (10, 4), (10, 6), (10, 11), (11, 11)], 'compound:prt': [(6, 2), (6, 6), (7, 11), (8, 2), (8, 6), (9, 4), (9, 6), (10, 4), (10, 6), (10, 11), (11, 11)], 'acl': [(6, 6), (7, 11), (8, 2), (9, 4), (10, 6), (10, 11), (11, 11)], 'nummod': [(6, 6), (7, 11), (8, 11), (9, 6), (11, 11)], 'flat': [(6, 11), (7, 11), (8, 2), (8, 11), (9, 4), (10, 6), (10, 11), (11, 11)], 'aux:pass': [(7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)], 'iobj': [(7, 11), (10, 4), (10, 11)], 'nsubj': [(7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)], 'obj': [(7, 11), (10, 4), (10, 6), (10, 11), (11, 11)], 'obl:tmod': [(7, 11), (9, 4), (10, 4), (10, 6), (11, 11)], 'case': [(8, 11), (9, 5)], 'cc': [(8, 11), (9, 5), (9, 6), (11, 11)], 'obl:npmod': [(8, 11), (9, 6), (9, 11), (10, 6), (11, 11)], 'punct': [(8, 11), (9, 6), (10, 6), (10, 11), (11, 5)], 'csubj': [(9, 11), (10, 6), (11, 11)], 'nsubj:pass': [(9, 11), (10, 6), (11, 11)], 'obl': [(9, 11), (10, 6)], 'acl:relcl': [(10, 6)], 'advcl': [(10, 6), (11, 11)], 'appos': [(10, 6), (10, 11), (11, 11)], 'ccomp': [(10, 6)], 'conj': [(10, 6)], 'nmod': [(10, 6), (10, 11)], 'vocative': [(10, 6)], 'xcomp': [(10, 6), (10, 11)]} ``` ## Environmental Impact - **Hardware Type:** Nvidia 2080 TI - **Hours used:** 30 ## Technical Specifications ### Model Architecture and Objective T5-Base, 12 layers, hidden dimensionality of 768. ## Citation **BibTeX:** ``` @misc{lindemann2024strengtheningstructuralinductivebiases, title={Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations}, author={Matthias Lindemann and Alexander Koller and Ivan Titov}, year={2024}, eprint={2407.04543}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.04543}, } ```