EasyDetect / pipeline /nltk /test /featstruct.doctest
sunnychenxiwang's picture
update nltk
d916065
raw
history blame
38.9 kB
.. Copyright (C) 2001-2023 NLTK Project
.. For license information, see LICENSE.TXT
==================================
Feature Structures & Unification
==================================
>>> from nltk.featstruct import FeatStruct
>>> from nltk.sem.logic import Variable, VariableExpression, Expression
.. note:: For now, featstruct uses the older lambdalogic semantics
module. Eventually, it should be updated to use the new first
order predicate logic module.
Overview
~~~~~~~~
A feature structure is a mapping from feature identifiers to feature
values, where feature values can be simple values (like strings or
ints), nested feature structures, or variables:
>>> fs1 = FeatStruct(number='singular', person=3)
>>> print(fs1)
[ number = 'singular' ]
[ person = 3 ]
Feature structure may be nested:
>>> fs2 = FeatStruct(type='NP', agr=fs1)
>>> print(fs2)
[ agr = [ number = 'singular' ] ]
[ [ person = 3 ] ]
[ ]
[ type = 'NP' ]
Variables are used to indicate that two features should be assigned
the same value. For example, the following feature structure requires
that the feature fs3['agr']['number'] be bound to the same value as the
feature fs3['subj']['number'].
>>> fs3 = FeatStruct(agr=FeatStruct(number=Variable('?n')),
... subj=FeatStruct(number=Variable('?n')))
>>> print(fs3)
[ agr = [ number = ?n ] ]
[ ]
[ subj = [ number = ?n ] ]
Feature structures are typically used to represent partial information
about objects. A feature name that is not mapped to a value stands
for a feature whose value is unknown (*not* a feature without a
value). Two feature structures that represent (potentially
overlapping) information about the same object can be combined by
*unification*.
>>> print(fs2.unify(fs3))
[ agr = [ number = 'singular' ] ]
[ [ person = 3 ] ]
[ ]
[ subj = [ number = 'singular' ] ]
[ ]
[ type = 'NP' ]
When two inconsistent feature structures are unified, the unification
fails and returns ``None``.
>>> fs4 = FeatStruct(agr=FeatStruct(person=1))
>>> print(fs4.unify(fs2))
None
>>> print(fs2.unify(fs4))
None
..
>>> del fs1, fs2, fs3, fs4 # clean-up
Feature Structure Types
-----------------------
There are actually two types of feature structure:
- *feature dictionaries*, implemented by `FeatDict`, act like
Python dictionaries. Feature identifiers may be strings or
instances of the `Feature` class.
- *feature lists*, implemented by `FeatList`, act like Python
lists. Feature identifiers are integers.
When you construct a feature structure using the `FeatStruct`
constructor, it will automatically decide which type is appropriate:
>>> type(FeatStruct(number='singular'))
<class 'nltk.featstruct.FeatDict'>
>>> type(FeatStruct([1,2,3]))
<class 'nltk.featstruct.FeatList'>
Usually, we will just use feature dictionaries; but sometimes feature
lists can be useful too. Two feature lists will unify with each other
only if they have equal lengths, and all of their feature values
match. If you wish to write a feature list that contains 'unknown'
values, you must use variables:
>>> fs1 = FeatStruct([1,2,Variable('?y')])
>>> fs2 = FeatStruct([1,Variable('?x'),3])
>>> fs1.unify(fs2)
[1, 2, 3]
..
>>> del fs1, fs2 # clean-up
Parsing Feature Structure Strings
---------------------------------
Feature structures can be constructed directly from strings. Often,
this is more convenient than constructing them directly. NLTK can
parse most feature strings to produce the corresponding feature
structures. (But you must restrict your base feature values to
strings, ints, logic expressions (`nltk.sem.logic.Expression`), and a
few other types discussed below).
Feature dictionaries are written like Python dictionaries, except that
keys are not put in quotes; and square brackets (``[]``) are used
instead of braces (``{}``):
>>> FeatStruct('[tense="past", agr=[number="sing", person=3]]')
[agr=[number='sing', person=3], tense='past']
If a feature value is a single alphanumeric word, then it does not
need to be quoted -- it will be automatically treated as a string:
>>> FeatStruct('[tense=past, agr=[number=sing, person=3]]')
[agr=[number='sing', person=3], tense='past']
Feature lists are written like python lists:
>>> FeatStruct('[1, 2, 3]')
[1, 2, 3]
The expression ``[]`` is treated as an empty feature dictionary, not
an empty feature list:
>>> type(FeatStruct('[]'))
<class 'nltk.featstruct.FeatDict'>
Feature Paths
-------------
Features can be specified using *feature paths*, or tuples of feature
identifiers that specify path through the nested feature structures to
a value.
>>> fs1 = FeatStruct('[x=1, y=[1,2,[z=3]]]')
>>> fs1['y']
[1, 2, [z=3]]
>>> fs1['y', 2]
[z=3]
>>> fs1['y', 2, 'z']
3
..
>>> del fs1 # clean-up
Reentrance
----------
Feature structures may contain reentrant feature values. A *reentrant
feature value* is a single feature structure that can be accessed via
multiple feature paths.
>>> fs1 = FeatStruct(x='val')
>>> fs2 = FeatStruct(a=fs1, b=fs1)
>>> print(fs2)
[ a = (1) [ x = 'val' ] ]
[ ]
[ b -> (1) ]
>>> fs2
[a=(1)[x='val'], b->(1)]
As you can see, reentrane is displayed by marking a feature structure
with a unique identifier, in this case ``(1)``, the first time it is
encountered; and then using the special form ``var -> id`` whenever it
is encountered again. You can use the same notation to directly
create reentrant feature structures from strings.
>>> FeatStruct('[a=(1)[], b->(1), c=[d->(1)]]')
[a=(1)[], b->(1), c=[d->(1)]]
Reentrant feature structures may contain cycles:
>>> fs3 = FeatStruct('(1)[a->(1)]')
>>> fs3['a', 'a', 'a', 'a']
(1)[a->(1)]
>>> fs3['a', 'a', 'a', 'a'] is fs3
True
Unification preserves the reentrance relations imposed by both of the
unified feature structures. In the feature structure resulting from
unification, any modifications to a reentrant feature value will be
visible using any of its feature paths.
>>> fs3.unify(FeatStruct('[a=[b=12], c=33]'))
(1)[a->(1), b=12, c=33]
..
>>> del fs1, fs2, fs3 # clean-up
Feature Structure Equality
--------------------------
Two feature structures are considered equal if they assign the same
values to all features, *and* they contain the same reentrances.
>>> fs1 = FeatStruct('[a=(1)[x=1], b->(1)]')
>>> fs2 = FeatStruct('[a=(1)[x=1], b->(1)]')
>>> fs3 = FeatStruct('[a=[x=1], b=[x=1]]')
>>> fs1 == fs1, fs1 is fs1
(True, True)
>>> fs1 == fs2, fs1 is fs2
(True, False)
>>> fs1 == fs3, fs1 is fs3
(False, False)
Note that this differs from how Python dictionaries and lists define
equality -- in particular, Python dictionaries and lists ignore
reentrance relations. To test two feature structures for equality
while ignoring reentrance relations, use the `equal_values()` method:
>>> fs1.equal_values(fs1)
True
>>> fs1.equal_values(fs2)
True
>>> fs1.equal_values(fs3)
True
..
>>> del fs1, fs2, fs3 # clean-up
Feature Value Sets & Feature Value Tuples
-----------------------------------------
`nltk.featstruct` defines two new data types that are intended to be
used as feature values: `FeatureValueTuple` and `FeatureValueSet`.
Both of these types are considered base values -- i.e., unification
does *not* apply to them. However, variable binding *does* apply to
any values that they contain.
Feature value tuples are written with parentheses:
>>> fs1 = FeatStruct('[x=(?x, ?y)]')
>>> fs1
[x=(?x, ?y)]
>>> fs1.substitute_bindings({Variable('?x'): 1, Variable('?y'): 2})
[x=(1, 2)]
Feature sets are written with braces:
>>> fs1 = FeatStruct('[x={?x, ?y}]')
>>> fs1
[x={?x, ?y}]
>>> fs1.substitute_bindings({Variable('?x'): 1, Variable('?y'): 2})
[x={1, 2}]
In addition to the basic feature value tuple & set classes, nltk
defines feature value unions (for sets) and feature value
concatenations (for tuples). These are written using '+', and can be
used to combine sets & tuples:
>>> fs1 = FeatStruct('[x=((1, 2)+?z), z=?z]')
>>> fs1
[x=((1, 2)+?z), z=?z]
>>> fs1.unify(FeatStruct('[z=(3, 4, 5)]'))
[x=(1, 2, 3, 4, 5), z=(3, 4, 5)]
Thus, feature value tuples and sets can be used to build up tuples
and sets of values over the course of unification. For example, when
parsing sentences using a semantic feature grammar, feature sets or
feature tuples can be used to build a list of semantic predicates as
the sentence is parsed.
As was mentioned above, unification does not apply to feature value
tuples and sets. One reason for this that it's impossible to define a
single correct answer for unification when concatenation is used.
Consider the following example:
>>> fs1 = FeatStruct('[x=(1, 2, 3, 4)]')
>>> fs2 = FeatStruct('[x=(?a+?b), a=?a, b=?b]')
If unification applied to feature tuples, then the unification
algorithm would have to arbitrarily choose how to divide the tuple
(1,2,3,4) into two parts. Instead, the unification algorithm refuses
to make this decision, and simply unifies based on value. Because
(1,2,3,4) is not equal to (?a+?b), fs1 and fs2 will not unify:
>>> print(fs1.unify(fs2))
None
If you need a list-like structure that unification does apply to, use
`FeatList`.
..
>>> del fs1, fs2 # clean-up
Light-weight Feature Structures
-------------------------------
Many of the functions defined by `nltk.featstruct` can be applied
directly to simple Python dictionaries and lists, rather than to
full-fledged `FeatDict` and `FeatList` objects. In other words,
Python ``dicts`` and ``lists`` can be used as "light-weight" feature
structures.
>>> # Note: pprint prints dicts sorted
>>> from pprint import pprint
>>> from nltk.featstruct import unify
>>> pprint(unify(dict(x=1, y=dict()), dict(a='a', y=dict(b='b'))))
{'a': 'a', 'x': 1, 'y': {'b': 'b'}}
However, you should keep in mind the following caveats:
- Python dictionaries & lists ignore reentrance when checking for
equality between values. But two FeatStructs with different
reentrances are considered nonequal, even if all their base
values are equal.
- FeatStructs can be easily frozen, allowing them to be used as
keys in hash tables. Python dictionaries and lists can not.
- FeatStructs display reentrance in their string representations;
Python dictionaries and lists do not.
- FeatStructs may *not* be mixed with Python dictionaries and lists
(e.g., when performing unification).
- FeatStructs provide a number of useful methods, such as `walk()`
and `cyclic()`, which are not available for Python dicts & lists.
In general, if your feature structures will contain any reentrances,
or if you plan to use them as dictionary keys, it is strongly
recommended that you use full-fledged `FeatStruct` objects.
Custom Feature Values
---------------------
The abstract base class `CustomFeatureValue` can be used to define new
base value types that have custom unification methods. For example,
the following feature value type encodes a range, and defines
unification as taking the intersection on the ranges:
>>> from functools import total_ordering
>>> from nltk.featstruct import CustomFeatureValue, UnificationFailure
>>> @total_ordering
... class Range(CustomFeatureValue):
... def __init__(self, low, high):
... assert low <= high
... self.low = low
... self.high = high
... def unify(self, other):
... if not isinstance(other, Range):
... return UnificationFailure
... low = max(self.low, other.low)
... high = min(self.high, other.high)
... if low <= high: return Range(low, high)
... else: return UnificationFailure
... def __repr__(self):
... return '(%s<x<%s)' % (self.low, self.high)
... def __eq__(self, other):
... if not isinstance(other, Range):
... return False
... return (self.low == other.low) and (self.high == other.high)
... def __lt__(self, other):
... if not isinstance(other, Range):
... return True
... return (self.low, self.high) < (other.low, other.high)
>>> fs1 = FeatStruct(x=Range(5,8), y=FeatStruct(z=Range(7,22)))
>>> print(fs1.unify(FeatStruct(x=Range(6, 22))))
[ x = (6<x<8) ]
[ ]
[ y = [ z = (7<x<22) ] ]
>>> print(fs1.unify(FeatStruct(x=Range(9, 12))))
None
>>> print(fs1.unify(FeatStruct(x=12)))
None
>>> print(fs1.unify(FeatStruct('[x=?x, y=[z=?x]]')))
[ x = (7<x<8) ]
[ ]
[ y = [ z = (7<x<8) ] ]
Regression Tests
~~~~~~~~~~~~~~~~
Dictionary access methods (non-mutating)
----------------------------------------
>>> fs1 = FeatStruct(a=1, b=2, c=3)
>>> fs2 = FeatStruct(x=fs1, y='x')
Feature structures support all dictionary methods (excluding the class
method `dict.fromkeys()`). Non-mutating methods:
>>> sorted(fs2.keys()) # keys()
['x', 'y']
>>> sorted(fs2.values()) # values()
[[a=1, b=2, c=3], 'x']
>>> sorted(fs2.items()) # items()
[('x', [a=1, b=2, c=3]), ('y', 'x')]
>>> sorted(fs2) # __iter__()
['x', 'y']
>>> 'a' in fs2, 'x' in fs2 # __contains__()
(False, True)
>>> fs2.has_key('a'), fs2.has_key('x') # has_key()
(False, True)
>>> fs2['x'], fs2['y'] # __getitem__()
([a=1, b=2, c=3], 'x')
>>> fs2['a'] # __getitem__()
Traceback (most recent call last):
. . .
KeyError: 'a'
>>> fs2.get('x'), fs2.get('y'), fs2.get('a') # get()
([a=1, b=2, c=3], 'x', None)
>>> fs2.get('x', 'hello'), fs2.get('a', 'hello') # get()
([a=1, b=2, c=3], 'hello')
>>> len(fs1), len(fs2) # __len__
(3, 2)
>>> fs2.copy() # copy()
[x=[a=1, b=2, c=3], y='x']
>>> fs2.copy() is fs2 # copy()
False
Note: by default, `FeatStruct.copy()` does a deep copy. Use
`FeatStruct.copy(deep=False)` for a shallow copy.
..
>>> del fs1, fs2 # clean-up.
Dictionary access methods (mutating)
------------------------------------
>>> fs1 = FeatStruct(a=1, b=2, c=3)
>>> fs2 = FeatStruct(x=fs1, y='x')
Setting features (`__setitem__()`)
>>> fs1['c'] = 5
>>> fs1
[a=1, b=2, c=5]
>>> fs1['x'] = 12
>>> fs1
[a=1, b=2, c=5, x=12]
>>> fs2['x', 'a'] = 2
>>> fs2
[x=[a=2, b=2, c=5, x=12], y='x']
>>> fs1
[a=2, b=2, c=5, x=12]
Deleting features (`__delitem__()`)
>>> del fs1['x']
>>> fs1
[a=2, b=2, c=5]
>>> del fs2['x', 'a']
>>> fs1
[b=2, c=5]
`setdefault()`:
>>> fs1.setdefault('b', 99)
2
>>> fs1
[b=2, c=5]
>>> fs1.setdefault('x', 99)
99
>>> fs1
[b=2, c=5, x=99]
`update()`:
>>> fs2.update({'a':'A', 'b':'B'}, c='C')
>>> fs2
[a='A', b='B', c='C', x=[b=2, c=5, x=99], y='x']
`pop()`:
>>> fs2.pop('a')
'A'
>>> fs2
[b='B', c='C', x=[b=2, c=5, x=99], y='x']
>>> fs2.pop('a')
Traceback (most recent call last):
. . .
KeyError: 'a'
>>> fs2.pop('a', 'foo')
'foo'
>>> fs2
[b='B', c='C', x=[b=2, c=5, x=99], y='x']
`clear()`:
>>> fs1.clear()
>>> fs1
[]
>>> fs2
[b='B', c='C', x=[], y='x']
`popitem()`:
>>> sorted([fs2.popitem() for i in range(len(fs2))])
[('b', 'B'), ('c', 'C'), ('x', []), ('y', 'x')]
>>> fs2
[]
Once a feature structure has been frozen, it may not be mutated.
>>> fs1 = FeatStruct('[x=1, y=2, z=[a=3]]')
>>> fs1.freeze()
>>> fs1.frozen()
True
>>> fs1['z'].frozen()
True
>>> fs1['x'] = 5
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> del fs1['x']
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> fs1.clear()
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> fs1.pop('x')
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> fs1.popitem()
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> fs1.setdefault('x')
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
>>> fs1.update(z=22)
Traceback (most recent call last):
. . .
ValueError: Frozen FeatStructs may not be modified.
..
>>> del fs1, fs2 # clean-up.
Feature Paths
-------------
Make sure that __getitem__ with feature paths works as intended:
>>> fs1 = FeatStruct(a=1, b=2,
... c=FeatStruct(
... d=FeatStruct(e=12),
... f=FeatStruct(g=55, h='hello')))
>>> fs1[()]
[a=1, b=2, c=[d=[e=12], f=[g=55, h='hello']]]
>>> fs1['a'], fs1[('a',)]
(1, 1)
>>> fs1['c','d','e']
12
>>> fs1['c','f','g']
55
Feature paths that select unknown features raise KeyError:
>>> fs1['c', 'f', 'e']
Traceback (most recent call last):
. . .
KeyError: ('c', 'f', 'e')
>>> fs1['q', 'p']
Traceback (most recent call last):
. . .
KeyError: ('q', 'p')
Feature paths that try to go 'through' a feature that's not a feature
structure raise KeyError:
>>> fs1['a', 'b']
Traceback (most recent call last):
. . .
KeyError: ('a', 'b')
Feature paths can go through reentrant structures:
>>> fs2 = FeatStruct('(1)[a=[b=[c->(1), d=5], e=11]]')
>>> fs2['a', 'b', 'c', 'a', 'e']
11
>>> fs2['a', 'b', 'c', 'a', 'b', 'd']
5
>>> fs2[tuple('abcabcabcabcabcabcabcabcabcabca')]
(1)[b=[c=[a->(1)], d=5], e=11]
Indexing requires strings, `Feature`\s, or tuples; other types raise a
TypeError:
>>> fs2[12]
Traceback (most recent call last):
. . .
TypeError: Expected feature name or path. Got 12.
>>> fs2[list('abc')]
Traceback (most recent call last):
. . .
TypeError: Expected feature name or path. Got ['a', 'b', 'c'].
Feature paths can also be used with `get()`, `has_key()`, and
`__contains__()`.
>>> fpath1 = tuple('abcabc')
>>> fpath2 = tuple('abcabz')
>>> fs2.get(fpath1), fs2.get(fpath2)
((1)[a=[b=[c->(1), d=5], e=11]], None)
>>> fpath1 in fs2, fpath2 in fs2
(True, False)
>>> fs2.has_key(fpath1), fs2.has_key(fpath2)
(True, False)
..
>>> del fs1, fs2 # clean-up
Reading Feature Structures
--------------------------
Empty feature struct:
>>> FeatStruct('[]')
[]
Test features with integer values:
>>> FeatStruct('[a=12, b=-33, c=0]')
[a=12, b=-33, c=0]
Test features with string values. Either single or double quotes may
be used. Strings are evaluated just like python strings -- in
particular, you can use escape sequences and 'u' and 'r' prefixes, and
triple-quoted strings.
>>> FeatStruct('[a="", b="hello", c="\'", d=\'\', e=\'"\']')
[a='', b='hello', c="'", d='', e='"']
>>> FeatStruct(r'[a="\\", b="\"", c="\x6f\\y", d="12"]')
[a='\\', b='"', c='o\\y', d='12']
>>> FeatStruct(r'[b=r"a\b\c"]')
[b='a\\b\\c']
>>> FeatStruct('[x="""a"""]')
[x='a']
Test parsing of reentrant feature structures.
>>> FeatStruct('[a=(1)[], b->(1)]')
[a=(1)[], b->(1)]
>>> FeatStruct('[a=(1)[x=1, y=2], b->(1)]')
[a=(1)[x=1, y=2], b->(1)]
Test parsing of cyclic feature structures.
>>> FeatStruct('[a=(1)[b->(1)]]')
[a=(1)[b->(1)]]
>>> FeatStruct('(1)[a=[b=[c->(1)]]]')
(1)[a=[b=[c->(1)]]]
Strings of the form "+name" and "-name" may be used to specify boolean
values.
>>> FeatStruct('[-bar, +baz, +foo]')
[-bar, +baz, +foo]
None, True, and False are recognized as values:
>>> FeatStruct('[bar=True, baz=False, foo=None]')
[+bar, -baz, foo=None]
Special features:
>>> FeatStruct('NP/VP')
NP[]/VP[]
>>> FeatStruct('?x/?x')
?x[]/?x[]
>>> print(FeatStruct('VP[+fin, agr=?x, tense=past]/NP[+pl, agr=?x]'))
[ *type* = 'VP' ]
[ ]
[ [ *type* = 'NP' ] ]
[ *slash* = [ agr = ?x ] ]
[ [ pl = True ] ]
[ ]
[ agr = ?x ]
[ fin = True ]
[ tense = 'past' ]
Here the slash feature gets coerced:
>>> FeatStruct('[*slash*=a, x=b, *type*="NP"]')
NP[x='b']/a[]
>>> FeatStruct('NP[sem=<bob>]/NP')
NP[sem=<bob>]/NP[]
>>> FeatStruct('S[sem=<walk(bob)>]')
S[sem=<walk(bob)>]
>>> print(FeatStruct('NP[sem=<bob>]/NP'))
[ *type* = 'NP' ]
[ ]
[ *slash* = [ *type* = 'NP' ] ]
[ ]
[ sem = <bob> ]
Playing with ranges:
>>> from nltk.featstruct import RangeFeature, FeatStructReader
>>> width = RangeFeature('width')
>>> reader = FeatStructReader([width])
>>> fs1 = reader.fromstring('[*width*=-5:12]')
>>> fs2 = reader.fromstring('[*width*=2:123]')
>>> fs3 = reader.fromstring('[*width*=-7:-2]')
>>> fs1.unify(fs2)
[*width*=(2, 12)]
>>> fs1.unify(fs3)
[*width*=(-5, -2)]
>>> print(fs2.unify(fs3)) # no overlap in width.
None
The slash feature has a default value of 'False':
>>> print(FeatStruct('NP[]/VP').unify(FeatStruct('NP[]'), trace=1))
<BLANKLINE>
Unification trace:
/ NP[]/VP[]
|\ NP[]
|
| Unify feature: *type*
| / 'NP'
| |\ 'NP'
| |
| +-->'NP'
|
| Unify feature: *slash*
| / VP[]
| |\ False
| |
X X <-- FAIL
None
The demo structures from category.py. They all parse, but they don't
do quite the right thing, -- ?x vs x.
>>> FeatStruct(pos='n', agr=FeatStruct(number='pl', gender='f'))
[agr=[gender='f', number='pl'], pos='n']
>>> FeatStruct(r'NP[sem=<bob>]/NP')
NP[sem=<bob>]/NP[]
>>> FeatStruct(r'S[sem=<app(?x, ?y)>]')
S[sem=<?x(?y)>]
>>> FeatStruct('?x/?x')
?x[]/?x[]
>>> FeatStruct('VP[+fin, agr=?x, tense=past]/NP[+pl, agr=?x]')
VP[agr=?x, +fin, tense='past']/NP[agr=?x, +pl]
>>> FeatStruct('S[sem = <app(?subj, ?vp)>]')
S[sem=<?subj(?vp)>]
>>> FeatStruct('S')
S[]
The parser also includes support for reading sets and tuples.
>>> FeatStruct('[x={1,2,2,2}, y={/}]')
[x={1, 2}, y={/}]
>>> FeatStruct('[x=(1,2,2,2), y=()]')
[x=(1, 2, 2, 2), y=()]
>>> print(FeatStruct('[x=(1,[z=(1,2,?x)],?z,{/})]'))
[ x = (1, [ z = (1, 2, ?x) ], ?z, {/}) ]
Note that we can't put a featstruct inside a tuple, because doing so
would hash it, and it's not frozen yet:
>>> print(FeatStruct('[x={[]}]'))
Traceback (most recent call last):
. . .
TypeError: FeatStructs must be frozen before they can be hashed.
There's a special syntax for taking the union of sets: "{...+...}".
The elements should only be variables or sets.
>>> FeatStruct('[x={?a+?b+{1,2,3}}]')
[x={?a+?b+{1, 2, 3}}]
There's a special syntax for taking the concatenation of tuples:
"(...+...)". The elements should only be variables or tuples.
>>> FeatStruct('[x=(?a+?b+(1,2,3))]')
[x=(?a+?b+(1, 2, 3))]
Parsing gives helpful messages if your string contains an error.
>>> FeatStruct('[a=, b=5]]')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
[a=, b=5]]
^ Expected value
>>> FeatStruct('[a=12 22, b=33]')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
[a=12 22, b=33]
^ Expected comma
>>> FeatStruct('[a=5] [b=6]')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
[a=5] [b=6]
^ Expected end of string
>>> FeatStruct(' *++*')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
*++*
^ Expected open bracket or identifier
>>> FeatStruct('[x->(1)]')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
[x->(1)]
^ Expected bound identifier
>>> FeatStruct('[x->y]')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
[x->y]
^ Expected identifier
>>> FeatStruct('')
Traceback (most recent call last):
. . .
ValueError: Error parsing feature structure
<BLANKLINE>
^ Expected open bracket or identifier
Unification
-----------
Very simple unifications give the expected results:
>>> FeatStruct().unify(FeatStruct())
[]
>>> FeatStruct(number='singular').unify(FeatStruct())
[number='singular']
>>> FeatStruct().unify(FeatStruct(number='singular'))
[number='singular']
>>> FeatStruct(number='singular').unify(FeatStruct(person=3))
[number='singular', person=3]
Merging nested structures:
>>> fs1 = FeatStruct('[A=[B=b]]')
>>> fs2 = FeatStruct('[A=[C=c]]')
>>> fs1.unify(fs2)
[A=[B='b', C='c']]
>>> fs2.unify(fs1)
[A=[B='b', C='c']]
A basic case of reentrant unification
>>> fs4 = FeatStruct('[A=(1)[B=b], E=[F->(1)]]')
>>> fs5 = FeatStruct("[A=[C='c'], E=[F=[D='d']]]")
>>> fs4.unify(fs5)
[A=(1)[B='b', C='c', D='d'], E=[F->(1)]]
>>> fs5.unify(fs4)
[A=(1)[B='b', C='c', D='d'], E=[F->(1)]]
More than 2 paths to a value
>>> fs1 = FeatStruct("[a=[],b=[],c=[],d=[]]")
>>> fs2 = FeatStruct('[a=(1)[], b->(1), c->(1), d->(1)]')
>>> fs1.unify(fs2)
[a=(1)[], b->(1), c->(1), d->(1)]
fs1[a] gets unified with itself
>>> fs1 = FeatStruct('[x=(1)[], y->(1)]')
>>> fs2 = FeatStruct('[x=(1)[], y->(1)]')
>>> fs1.unify(fs2)
[x=(1)[], y->(1)]
Bound variables should get forwarded appropriately
>>> fs1 = FeatStruct('[A=(1)[X=x], B->(1), C=?cvar, D=?dvar]')
>>> fs2 = FeatStruct('[A=(1)[Y=y], B=(2)[Z=z], C->(1), D->(2)]')
>>> fs1.unify(fs2)
[A=(1)[X='x', Y='y', Z='z'], B->(1), C->(1), D->(1)]
>>> fs2.unify(fs1)
[A=(1)[X='x', Y='y', Z='z'], B->(1), C->(1), D->(1)]
Cyclic structure created by unification.
>>> fs1 = FeatStruct('[F=(1)[], G->(1)]')
>>> fs2 = FeatStruct('[F=[H=(2)[]], G->(2)]')
>>> fs3 = fs1.unify(fs2)
>>> fs3
[F=(1)[H->(1)], G->(1)]
>>> fs3['F'] is fs3['G']
True
>>> fs3['F'] is fs3['G']['H']
True
>>> fs3['F'] is fs3['G']['H']['H']
True
>>> fs3['F'] is fs3['F']['H']['H']['H']['H']['H']['H']['H']['H']
True
Cyclic structure created w/ variables.
>>> fs1 = FeatStruct('[F=[H=?x]]')
>>> fs2 = FeatStruct('[F=?x]')
>>> fs3 = fs1.unify(fs2, rename_vars=False)
>>> fs3
[F=(1)[H->(1)]]
>>> fs3['F'] is fs3['F']['H']
True
>>> fs3['F'] is fs3['F']['H']['H']
True
>>> fs3['F'] is fs3['F']['H']['H']['H']['H']['H']['H']['H']['H']
True
Unifying w/ a cyclic feature structure.
>>> fs4 = FeatStruct('[F=[H=[H=[H=(1)[]]]], K->(1)]')
>>> fs3.unify(fs4)
[F=(1)[H->(1)], K->(1)]
>>> fs4.unify(fs3)
[F=(1)[H->(1)], K->(1)]
Variable bindings should preserve reentrance.
>>> bindings = {}
>>> fs1 = FeatStruct("[a=?x]")
>>> fs2 = fs1.unify(FeatStruct("[a=[]]"), bindings)
>>> fs2['a'] is bindings[Variable('?x')]
True
>>> fs2.unify(FeatStruct("[b=?x]"), bindings)
[a=(1)[], b->(1)]
Aliased variable tests
>>> fs1 = FeatStruct("[a=?x, b=?x]")
>>> fs2 = FeatStruct("[b=?y, c=?y]")
>>> bindings = {}
>>> fs3 = fs1.unify(fs2, bindings)
>>> fs3
[a=?x, b=?x, c=?x]
>>> bindings
{Variable('?y'): Variable('?x')}
>>> fs3.unify(FeatStruct("[a=1]"))
[a=1, b=1, c=1]
If we keep track of the bindings, then we can use the same variable
over multiple calls to unify.
>>> bindings = {}
>>> fs1 = FeatStruct('[a=?x]')
>>> fs2 = fs1.unify(FeatStruct('[a=[]]'), bindings)
>>> fs2.unify(FeatStruct('[b=?x]'), bindings)
[a=(1)[], b->(1)]
>>> bindings
{Variable('?x'): []}
..
>>> del fs1, fs2, fs3, fs4, fs5 # clean-up
Unification Bindings
--------------------
>>> bindings = {}
>>> fs1 = FeatStruct('[a=?x]')
>>> fs2 = FeatStruct('[a=12]')
>>> fs3 = FeatStruct('[b=?x]')
>>> fs1.unify(fs2, bindings)
[a=12]
>>> bindings
{Variable('?x'): 12}
>>> fs3.substitute_bindings(bindings)
[b=12]
>>> fs3 # substitute_bindings didn't mutate fs3.
[b=?x]
>>> fs2.unify(fs3, bindings)
[a=12, b=12]
>>> bindings = {}
>>> fs1 = FeatStruct('[a=?x, b=1]')
>>> fs2 = FeatStruct('[a=5, b=?x]')
>>> fs1.unify(fs2, bindings)
[a=5, b=1]
>>> sorted(bindings.items())
[(Variable('?x'), 5), (Variable('?x2'), 1)]
..
>>> del fs1, fs2, fs3 # clean-up
Expressions
-----------
>>> e = Expression.fromstring('\\P y.P(z,y)')
>>> fs1 = FeatStruct(x=e, y=Variable('z'))
>>> fs2 = FeatStruct(y=VariableExpression(Variable('John')))
>>> fs1.unify(fs2)
[x=<\P y.P(John,y)>, y=<John>]
Remove Variables
----------------
>>> FeatStruct('[a=?x, b=12, c=[d=?y]]').remove_variables()
[b=12, c=[]]
>>> FeatStruct('(1)[a=[b=?x,c->(1)]]').remove_variables()
(1)[a=[c->(1)]]
Equality & Hashing
------------------
The `equal_values` method checks whether two feature structures assign
the same value to every feature. If the optional argument
``check_reentrances`` is supplied, then it also returns false if there
is any difference in the reentrances.
>>> a = FeatStruct('(1)[x->(1)]')
>>> b = FeatStruct('(1)[x->(1)]')
>>> c = FeatStruct('(1)[x=[x->(1)]]')
>>> d = FeatStruct('[x=(1)[x->(1)]]')
>>> e = FeatStruct('(1)[x=[x->(1), y=1], y=1]')
>>> def compare(x,y):
... assert x.equal_values(y, True) == y.equal_values(x, True)
... assert x.equal_values(y, False) == y.equal_values(x, False)
... if x.equal_values(y, True):
... assert x.equal_values(y, False)
... print('equal values, same reentrance')
... elif x.equal_values(y, False):
... print('equal values, different reentrance')
... else:
... print('different values')
>>> compare(a, a)
equal values, same reentrance
>>> compare(a, b)
equal values, same reentrance
>>> compare(a, c)
equal values, different reentrance
>>> compare(a, d)
equal values, different reentrance
>>> compare(c, d)
equal values, different reentrance
>>> compare(a, e)
different values
>>> compare(c, e)
different values
>>> compare(d, e)
different values
>>> compare(e, e)
equal values, same reentrance
Feature structures may not be hashed until they are frozen:
>>> hash(a)
Traceback (most recent call last):
. . .
TypeError: FeatStructs must be frozen before they can be hashed.
>>> a.freeze()
>>> v = hash(a)
Feature structures define hash consistently. The following example
looks at the hash value for each (fs1,fs2) pair; if their hash values
are not equal, then they must not be equal. If their hash values are
equal, then display a message, and indicate whether their values are
indeed equal. Note that c and d currently have the same hash value,
even though they are not equal. That is not a bug, strictly speaking,
but it wouldn't be a bad thing if it changed.
>>> for fstruct in (a, b, c, d, e):
... fstruct.freeze()
>>> for fs1_name in 'abcde':
... for fs2_name in 'abcde':
... fs1 = locals()[fs1_name]
... fs2 = locals()[fs2_name]
... if hash(fs1) != hash(fs2):
... assert fs1 != fs2
... else:
... print('%s and %s have the same hash value,' %
... (fs1_name, fs2_name))
... if fs1 == fs2: print('and are equal')
... else: print('and are not equal')
a and a have the same hash value, and are equal
a and b have the same hash value, and are equal
b and a have the same hash value, and are equal
b and b have the same hash value, and are equal
c and c have the same hash value, and are equal
c and d have the same hash value, and are not equal
d and c have the same hash value, and are not equal
d and d have the same hash value, and are equal
e and e have the same hash value, and are equal
..
>>> del a, b, c, d, e, v # clean-up
Tracing
-------
>>> fs1 = FeatStruct('[a=[b=(1)[], c=?x], d->(1), e=[f=?x]]')
>>> fs2 = FeatStruct('[a=(1)[c="C"], e=[g->(1)]]')
>>> fs1.unify(fs2, trace=True)
<BLANKLINE>
Unification trace:
/ [a=[b=(1)[], c=?x], d->(1), e=[f=?x]]
|\ [a=(1)[c='C'], e=[g->(1)]]
|
| Unify feature: a
| / [b=[], c=?x]
| |\ [c='C']
| |
| | Unify feature: a.c
| | / ?x
| | |\ 'C'
| | |
| | +-->Variable('?x')
| |
| +-->[b=[], c=?x]
| Bindings: {?x: 'C'}
|
| Unify feature: e
| / [f=?x]
| |\ [g=[c='C']]
| |
| +-->[f=?x, g=[b=[], c=?x]]
| Bindings: {?x: 'C'}
|
+-->[a=(1)[b=(2)[], c='C'], d->(2), e=[f='C', g->(1)]]
Bindings: {?x: 'C'}
[a=(1)[b=(2)[], c='C'], d->(2), e=[f='C', g->(1)]]
>>>
>>> fs1 = FeatStruct('[a=?x, b=?z, c=?z]')
>>> fs2 = FeatStruct('[a=?y, b=?y, c=?q]')
>>> #fs1.unify(fs2, trace=True)
>>>
..
>>> del fs1, fs2 # clean-up
Unification on Dicts & Lists
----------------------------
It's possible to do unification on dictionaries:
>>> from nltk.featstruct import unify
>>> pprint(unify(dict(x=1, y=dict(z=2)), dict(x=1, q=5)), width=1)
{'q': 5, 'x': 1, 'y': {'z': 2}}
It's possible to do unification on lists as well:
>>> unify([1, 2, 3], [1, Variable('x'), 3])
[1, 2, 3]
Mixing dicts and lists is fine:
>>> pprint(unify([dict(x=1, y=dict(z=2)),3], [dict(x=1, q=5),3]),
... width=1)
[{'q': 5, 'x': 1, 'y': {'z': 2}}, 3]
Mixing dicts and FeatStructs is discouraged:
>>> unify(dict(x=1), FeatStruct(x=1))
Traceback (most recent call last):
. . .
ValueError: Mixing FeatStruct objects with Python dicts and lists is not supported.
But you can do it if you really want, by explicitly stating that both
dictionaries and FeatStructs should be treated as feature structures:
>>> unify(dict(x=1), FeatStruct(x=1), fs_class=(dict, FeatStruct))
{'x': 1}
Finding Conflicts
-----------------
>>> from nltk.featstruct import conflicts
>>> fs1 = FeatStruct('[a=[b=(1)[c=2], d->(1), e=[f->(1)]]]')
>>> fs2 = FeatStruct('[a=[b=[c=[x=5]], d=[c=2], e=[f=[c=3]]]]')
>>> for path in conflicts(fs1, fs2):
... print('%-8s: %r vs %r' % ('.'.join(path), fs1[path], fs2[path]))
a.b.c : 2 vs [x=5]
a.e.f.c : 2 vs 3
..
>>> del fs1, fs2 # clean-up
Retracting Bindings
-------------------
>>> from nltk.featstruct import retract_bindings
>>> bindings = {}
>>> fs1 = FeatStruct('[a=?x, b=[c=?y]]')
>>> fs2 = FeatStruct('[a=(1)[c=[d=1]], b->(1)]')
>>> fs3 = fs1.unify(fs2, bindings)
>>> print(fs3)
[ a = (1) [ c = [ d = 1 ] ] ]
[ ]
[ b -> (1) ]
>>> pprint(bindings)
{Variable('?x'): [c=[d=1]], Variable('?y'): [d=1]}
>>> retract_bindings(fs3, bindings)
[a=?x, b=?x]
>>> pprint(bindings)
{Variable('?x'): [c=?y], Variable('?y'): [d=1]}
Squashed Bugs
~~~~~~~~~~~~~
In svn rev 5167, unifying two feature structures that used the same
variable would cause those variables to become aliased in the output.
>>> fs1 = FeatStruct('[a=?x]')
>>> fs2 = FeatStruct('[b=?x]')
>>> fs1.unify(fs2)
[a=?x, b=?x2]
There was a bug in svn revision 5172 that caused `rename_variables` to
rename variables to names that are already used.
>>> FeatStruct('[a=?x, b=?x2]').rename_variables(
... vars=[Variable('?x')])
[a=?x3, b=?x2]
>>> fs1 = FeatStruct('[a=?x]')
>>> fs2 = FeatStruct('[a=?x, b=?x2]')
>>> fs1.unify(fs2)
[a=?x, b=?x2]
There was a bug in svn rev 5167 that caused us to get the following
example wrong. Basically the problem was that we only followed
'forward' pointers for other, not self, when unifying two feature
structures. (nb: this test assumes that features are unified in
alphabetical order -- if they are not, it might pass even if the bug
is present.)
>>> fs1 = FeatStruct('[a=[x=1], b=?x, c=?x]')
>>> fs2 = FeatStruct('[a=(1)[], b->(1), c=[x=2]]')
>>> print(fs1.unify(fs2))
None
..
>>> del fs1, fs2 # clean-up