alwaysaditi's picture
End of training
dc78b20 verified
raw
history blame contribute delete
No virus
4.58 kB
the levels of accuracy and robustness recently achieved by statistical parsers (e.g. collins (1999),charniak (2000)) have led to their use in a num ber of nlp applications, such as question-answering(pasca and harabagiu, 2001), machine translation (charniak et al, 2003), sentence simplifica tion (carroll et al, 1999), and a linguist?s search engine (resnik and elkiss, 2003). such parsers typically return phrase-structure trees in the styleof the penn treebank, but without traces and co indexation. however, the usefulness of this outputis limited, since the underlying meaning (as repre sented in a predicate-argument structure or logical form) is difficult to reconstruct from such skeletal parse trees.in this paper we demonstrate how a widecoverage statistical parser using combinatory categorial grammar (ccg) can be used to generate semantic representations. there are a number of ad vantages to using ccg for this task. first, ccg provides ?surface compositional? analysis of certainsyntactic phenomena such as coordination and ex traction, allowing the logical form to be obtained for such cases in a straightforward way. second, ccg isa lexicalised grammar, and only uses a small num ber of semantically transparent combinatory rules tocombine ccg categories. hence providing a compositional semantics for ccg simply amounts to assigning semantic representations to the lexical en tries and interpreting the combinatory rules. andthird, there exist highly accurate, efficient and ro bust ccg parsers which can be used directly for this task (clark and curran, 2004b; hockenmaier, 2003).the existing ccg parsers deliver predicate argu ment structures, but not semantic representations that can be used for inference. the present paper seeks to extend one of these wide coverage parsers by using it to build logical forms suitable for use invarious nlp applications that require semantic in terpretation.we show how to construct first-order represen tations from ccg derivations using the ?-calculus, and demonstrate that semantic representations can be produced for over 97% of the sentences in unseen wsj text. the only other deep parser we are aware of to achieve such levels of robustness for the wsj is kaplan et al (2004). the use of the ?-calculusis integral to our method. however, first-order rep resentations are simply used as a proof-of-concept; we could have used drss (kamp and reyle, 1993)or some other representation more tailored to the ap plication in hand.there is some existing work with a similar motivation to ours. briscoe and carroll (2002) gen erate underspecified semantic representations fromtheir robust parser. toutanova et al (2002) and ka plan et al (2004) combine statistical methods with a linguistically motivated grammar formalism (hpsg and lfg respectively) in an attempt to achieve levels of robustness and accuracy comparable to the penn treebank parsers (which kaplan et al do achieve). however, there is a key difference between these approaches and ours. in our approach the creation of the semantic representations forms a completely it could cost taxpayers 15 million to install and residents 1 million a year to maintain np in our approach the creation of the semantic representations forms a completely it could cost taxpayers 15 million to install and residents 1 million a year to maintain np the levels of accuracy and robustness recently achieved by statistical parsers (e.g. collins (1999),charniak (2000)) have led to their use in a num ber of nlp applications, such as question-answering(pasca and harabagiu, 2001), machine translation (charniak et al, 2003), sentence simplifica tion (carroll et al, 1999), and a linguist?s search engine (resnik and elkiss, 2003). however, there is a key difference between these approaches and ours. such parsers typically return phrase-structure trees in the styleof the penn treebank, but without traces and co indexation. toutanova et al (2002) and ka plan et al (2004) combine statistical methods with a linguistically motivated grammar formalism (hpsg and lfg respectively) in an attempt to achieve levels of robustness and accuracy comparable to the penn treebank parsers (which kaplan et al do achieve). however, the usefulness of this outputis limited, since the underlying meaning (as repre sented in a predicate-argument structure or logical form) is difficult to reconstruct from such skeletal parse trees.in this paper we demonstrate how a widecoverage statistical parser using combinatory categorial grammar (ccg) can be used to generate semantic representations.