Spaces:
Sleeping
Sleeping
File size: 10,797 Bytes
d916065 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
.. Copyright (C) 2001-2023 NLTK Project
.. For license information, see LICENSE.TXT
========
FrameNet
========
The FrameNet corpus is a lexical database of English that is both human-
and machine-readable, based on annotating examples of how words are used
in actual texts. FrameNet is based on a theory of meaning called Frame
Semantics, deriving from the work of Charles J. Fillmore and colleagues.
The basic idea is straightforward: that the meanings of most words can
best be understood on the basis of a semantic frame: a description of a
type of event, relation, or entity and the participants in it. For
example, the concept of cooking typically involves a person doing the
cooking (Cook), the food that is to be cooked (Food), something to hold
the food while cooking (Container) and a source of heat
(Heating_instrument). In the FrameNet project, this is represented as a
frame called Apply_heat, and the Cook, Food, Heating_instrument and
Container are called frame elements (FEs). Words that evoke this frame,
such as fry, bake, boil, and broil, are called lexical units (LUs) of
the Apply_heat frame. The job of FrameNet is to define the frames
and to annotate sentences to show how the FEs fit syntactically around
the word that evokes the frame.
------
Frames
------
A Frame is a script-like conceptual structure that describes a
particular type of situation, object, or event along with the
participants and props that are needed for that Frame. For
example, the "Apply_heat" frame describes a common situation
involving a Cook, some Food, and a Heating_Instrument, and is
evoked by words such as bake, blanch, boil, broil, brown,
simmer, steam, etc.
We call the roles of a Frame "frame elements" (FEs) and the
frame-evoking words are called "lexical units" (LUs).
FrameNet includes relations between Frames. Several types of
relations are defined, of which the most important are:
- Inheritance: An IS-A relation. The child frame is a subtype
of the parent frame, and each FE in the parent is bound to
a corresponding FE in the child. An example is the
"Revenge" frame which inherits from the
"Rewards_and_punishments" frame.
- Using: The child frame presupposes the parent frame as
background, e.g the "Speed" frame "uses" (or presupposes)
the "Motion" frame; however, not all parent FEs need to be
bound to child FEs.
- Subframe: The child frame is a subevent of a complex event
represented by the parent, e.g. the "Criminal_process" frame
has subframes of "Arrest", "Arraignment", "Trial", and
"Sentencing".
- Perspective_on: The child frame provides a particular
perspective on an un-perspectivized parent frame. A pair of
examples consists of the "Hiring" and "Get_a_job" frames,
which perspectivize the "Employment_start" frame from the
Employer's and the Employee's point of view, respectively.
To get a list of all of the Frames in FrameNet, you can use the
`frames()` function. If you supply a regular expression pattern to the
`frames()` function, you will get a list of all Frames whose names match
that pattern:
>>> from pprint import pprint
>>> from operator import itemgetter
>>> from nltk.corpus import framenet as fn
>>> from nltk.corpus.reader.framenet import PrettyList
>>> x = fn.frames(r'(?i)crim')
>>> x.sort(key=itemgetter('ID'))
>>> x
[<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
>>> PrettyList(sorted(x, key=itemgetter('ID')))
[<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
To get the details of a particular Frame, you can use the `frame()`
function passing in the frame number:
>>> from pprint import pprint
>>> from nltk.corpus import framenet as fn
>>> f = fn.frame(202)
>>> f.ID
202
>>> f.name
'Arrest'
>>> f.definition
"Authorities charge a Suspect, who is under suspicion of having committed a crime..."
>>> len(f.lexUnit)
11
>>> pprint(sorted([x for x in f.FE]))
['Authorities',
'Charges',
'Co-participant',
'Manner',
'Means',
'Offense',
'Place',
'Purpose',
'Source_of_legal_authority',
'Suspect',
'Time',
'Type']
>>> pprint(f.frameRelations)
[<Parent=Intentionally_affect -- Inheritance -> Child=Arrest>, <Complex=Criminal_process -- Subframe -> Component=Arrest>, ...]
The `frame()` function shown above returns a dict object containing
detailed information about the Frame. See the documentation on the
`frame()` function for the specifics.
You can also search for Frames by their Lexical Units (LUs). The
`frames_by_lemma()` function returns a list of all frames that contain
LUs in which the 'name' attribute of the LU matches the given regular
expression. Note that LU names are composed of "lemma.POS", where the
"lemma" part can be made up of either a single lexeme (e.g. 'run') or
multiple lexemes (e.g. 'a little') (see below).
>>> PrettyList(sorted(fn.frames_by_lemma(r'(?i)a little'), key=itemgetter('ID')))
[<frame ID=189 name=Quanti...>, <frame ID=2001 name=Degree>]
-------------
Lexical Units
-------------
A lexical unit (LU) is a pairing of a word with a meaning. For
example, the "Apply_heat" Frame describes a common situation
involving a Cook, some Food, and a Heating Instrument, and is
_evoked_ by words such as bake, blanch, boil, broil, brown,
simmer, steam, etc. These frame-evoking words are the LUs in the
Apply_heat frame. Each sense of a polysemous word is a different
LU.
We have used the word "word" in talking about LUs. The reality
is actually rather complex. When we say that the word "bake" is
polysemous, we mean that the lemma "bake.v" (which has the
word-forms "bake", "bakes", "baked", and "baking") is linked to
three different frames:
- Apply_heat: "Michelle baked the potatoes for 45 minutes."
- Cooking_creation: "Michelle baked her mother a cake for her birthday."
- Absorb_heat: "The potatoes have to bake for more than 30 minutes."
These constitute three different LUs, with different
definitions.
Multiword expressions such as "given name" and hyphenated words
like "shut-eye" can also be LUs. Idiomatic phrases such as
"middle of nowhere" and "give the slip (to)" are also defined as
LUs in the appropriate frames ("Isolated_places" and "Evading",
respectively), and their internal structure is not analyzed.
Framenet provides multiple annotated examples of each sense of a
word (i.e. each LU). Moreover, the set of examples
(approximately 20 per LU) illustrates all of the combinatorial
possibilities of the lexical unit.
Each LU is linked to a Frame, and hence to the other words which
evoke that Frame. This makes the FrameNet database similar to a
thesaurus, grouping together semantically similar words.
In the simplest case, frame-evoking words are verbs such as
"fried" in:
"Matilde fried the catfish in a heavy iron skillet."
Sometimes event nouns may evoke a Frame. For example,
"reduction" evokes "Cause_change_of_scalar_position" in:
"...the reduction of debt levels to $665 million from $2.6 billion."
Adjectives may also evoke a Frame. For example, "asleep" may
evoke the "Sleep" frame as in:
"They were asleep for hours."
Many common nouns, such as artifacts like "hat" or "tower",
typically serve as dependents rather than clearly evoking their
own frames.
Details for a specific lexical unit can be obtained using this class's
`lus()` function, which takes an optional regular expression
pattern that will be matched against the name of the lexical unit:
>>> from pprint import pprint
>>> PrettyList(sorted(fn.lus(r'(?i)a little'), key=itemgetter('ID')))
[<lu ID=14733 name=a little.n>, <lu ID=14743 name=a little.adv>, ...]
You can obtain detailed information on a particular LU by calling the
`lu()` function and passing in an LU's 'ID' number:
>>> from pprint import pprint
>>> from nltk.corpus import framenet as fn
>>> fn.lu(256).name
'foresee.v'
>>> fn.lu(256).definition
'COD: be aware of beforehand; predict.'
>>> fn.lu(256).frame.name
'Expectation'
>>> fn.lu(256).lexemes[0].name
'foresee'
Note that LU names take the form of a dotted string (e.g. "run.v" or "a
little.adv") in which a lemma precedes the "." and a part of speech
(POS) follows the dot. The lemma may be composed of a single lexeme
(e.g. "run") or of multiple lexemes (e.g. "a little"). The list of
POSs used in the LUs is:
v - verb
n - noun
a - adjective
adv - adverb
prep - preposition
num - numbers
intj - interjection
art - article
c - conjunction
scon - subordinating conjunction
For more detailed information about the info that is contained in the
dict that is returned by the `lu()` function, see the documentation on
the `lu()` function.
-------------------
Annotated Documents
-------------------
The FrameNet corpus contains a small set of annotated documents. A list
of these documents can be obtained by calling the `docs()` function:
>>> from pprint import pprint
>>> from nltk.corpus import framenet as fn
>>> d = fn.docs('BellRinging')[0]
>>> d.corpname
'PropBank'
>>> d.sentence[49]
full-text sentence (...) in BellRinging:
<BLANKLINE>
<BLANKLINE>
[POS] 17 tags
<BLANKLINE>
[POS_tagset] PENN
<BLANKLINE>
[text] + [annotationSet]
<BLANKLINE>
`` I live in hopes that the ringers themselves will be drawn into
***** ******* *****
Desir Cause_t Cause
[1] [3] [2]
<BLANKLINE>
that fuller life .
******
Comple
[4]
(Desir=Desiring, Cause_t=Cause_to_make_noise, Cause=Cause_motion, Comple=Completeness)
<BLANKLINE>
>>> d.sentence[49].annotationSet[1]
annotation set (...):
<BLANKLINE>
[status] MANUAL
<BLANKLINE>
[LU] (6605) hope.n in Desiring
<BLANKLINE>
[frame] (366) Desiring
<BLANKLINE>
[GF] 2 relations
<BLANKLINE>
[PT] 2 phrases
<BLANKLINE>
[text] + [Target] + [FE] + [Noun]
<BLANKLINE>
`` I live in hopes that the ringers themselves will be drawn into
- ^^^^ ^^ ***** ----------------------------------------------
E supp su Event
<BLANKLINE>
that fuller life .
-----------------
<BLANKLINE>
(E=Experiencer, su=supp)
<BLANKLINE>
<BLANKLINE>
|