namednil commited on
Commit
934fc23
·
verified ·
1 Parent(s): 59d55b8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for STEP
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+ This model is pre-trained to perform (random) syntactic transformations of English sentences. The prefix given to the model decides, which syntactic transformation to apply.
10
+
11
+ See [Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations](https://arxiv.org/abs/2407.04543) for full details.
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+
19
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
20
+
21
+ - **Developed by:** Matthias Lindemann
22
+ - **Funded by [optional]:** UKRI, Huawei, Dutch National Science Foundation
23
+ - **Model type:** Sequence-to-Sequence model
24
+ - **Language(s) (NLP):** English
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model:** T5-Base
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** https://github.com/namednil/step
33
+ - **Paper:** [Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations](https://arxiv.org/abs/2407.04543)
34
+
35
+ ## Uses
36
+
37
+ Syntax-sensitive sequence-to-sequence for English such as passivization, semantic parsing, question formation, ...
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ This model needs to be fine-tuned as it implements random syntactic transformations.
43
+
44
+ ## Bias, Risks, and Limitations
45
+
46
+ The model was exposed to the C4 corpus (pre-training data of T5) and is based on T5 and hence likely inherits biases from that.
47
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
48
+
49
+ ### Recommendations
50
+
51
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
52
+
53
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
54
+
55
+
56
+ ## Model Examination [optional]
57
+
58
+ We identified the following interpretable transformation look-up heads (see paper for details) for UD relations (in the format (layer, head) both with 0-based indexing):
59
+
60
+ ```python
61
+ {'cop': [(0, 3), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
62
+ 'expl': [(0, 7), (7, 11), (8, 2), (8, 11), (9, 6), (9, 7), (11, 11)],
63
+ 'amod': [(4, 6), (6, 6), (7, 11), (8, 0), (8, 11), (9, 5), (11, 11)],
64
+ 'compound': [(4, 6), (6, 6), (7, 6), (7, 11), (8, 11), (9, 5), (9, 7), (9, 11), (11, 11)],
65
+ 'det': [(4, 6), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5)],
66
+ 'nmod:poss': [(4, 6), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
67
+ 'advmod': [(4, 11), (6, 6), (7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
68
+ 'aux': [(4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
69
+ 'mark': [(4, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
70
+ 'fixed': [(5, 5), (8, 2), (8, 6), (9, 4), (9, 6), (10, 1), (10, 4), (10, 6), (10, 11), (11, 11)],
71
+ 'compound:prt': [(6, 2), (6, 6), (7, 11), (8, 2), (8, 6), (9, 4), (9, 6), (10, 4), (10, 6),
72
+ (10, 11), (11, 11)],
73
+ 'acl': [(6, 6), (7, 11), (8, 2), (9, 4), (10, 6), (10, 11), (11, 11)],
74
+ 'nummod': [(6, 6), (7, 11), (8, 11), (9, 6), (11, 11)],
75
+ 'flat': [(6, 11), (7, 11), (8, 2), (8, 11), (9, 4), (10, 6), (10, 11), (11, 11)],
76
+ 'aux:pass': [(7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
77
+ 'iobj': [(7, 11), (10, 4), (10, 11)],
78
+ 'nsubj': [(7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
79
+ 'obj': [(7, 11), (10, 4), (10, 6), (10, 11), (11, 11)],
80
+ 'obl:tmod': [(7, 11), (9, 4), (10, 4), (10, 6), (11, 11)], 'case': [(8, 11), (9, 5)],
81
+ 'cc': [(8, 11), (9, 5), (9, 6), (11, 11)],
82
+ 'obl:npmod': [(8, 11), (9, 6), (9, 11), (10, 6), (11, 11)],
83
+ 'punct': [(8, 11), (9, 6), (10, 6), (10, 11), (11, 5)], 'csubj': [(9, 11), (10, 6), (11, 11)],
84
+ 'nsubj:pass': [(9, 11), (10, 6), (11, 11)], 'obl': [(9, 11), (10, 6)], 'acl:relcl': [(10, 6)],
85
+ 'advcl': [(10, 6), (11, 11)], 'appos': [(10, 6), (10, 11), (11, 11)], 'ccomp': [(10, 6)],
86
+ 'conj': [(10, 6)], 'nmod': [(10, 6), (10, 11)], 'vocative': [(10, 6)],
87
+ 'xcomp': [(10, 6), (10, 11)]}
88
+ ```
89
+
90
+
91
+ ## Environmental Impact
92
+
93
+ - **Hardware Type:** Nvidia 2080 TI
94
+ - **Hours used:** 30
95
+
96
+ ## Technical Specifications
97
+
98
+ ### Model Architecture and Objective
99
+
100
+ T5-Base, 12 layers, hidden dimensionality of 768.
101
+
102
+ ## Citation
103
+
104
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
105
+
106
+ **BibTeX:**
107
+ ```
108
+ @misc{lindemann2024strengtheningstructuralinductivebiases,
109
+ title={Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations},
110
+ author={Matthias Lindemann and Alexander Koller and Ivan Titov},
111
+ year={2024},
112
+ eprint={2407.04543},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CL},
115
+ url={https://arxiv.org/abs/2407.04543},
116
+ }
117
+ ```