Sample usage for tree¶
Unit tests for nltk.tree.Tree¶
>>> from nltk.tree import *
Some trees to run tests on:
>>> dp1 = Tree('dp', [Tree('d', ['the']), Tree('np', ['dog'])])
>>> dp2 = Tree('dp', [Tree('d', ['the']), Tree('np', ['cat'])])
>>> vp = Tree('vp', [Tree('v', ['chased']), dp2])
>>> tree = Tree('s', [dp1, vp])
>>> print(tree)
(s (dp (d the) (np dog)) (vp (v chased) (dp (d the) (np cat))))
The node label is accessed using the label() method:
>>> dp1.label(), dp2.label(), vp.label(), tree.label()
('dp', 'dp', 'vp', 's')
>>> print(tree[1,1,1,0])
cat
The treepositions method returns a list of the tree positions of subtrees and leaves in a tree. By default, it gives the position of every tree, subtree, and leaf, in prefix order:
>>> print(tree.treepositions())
[(), (0,), (0, 0), (0, 0, 0), (0, 1), (0, 1, 0), (1,), (1, 0), (1, 0, 0), (1, 1), (1, 1, 0), (1, 1, 0, 0), (1, 1, 1), (1, 1, 1, 0)]
In addition to str and repr, several methods exist to convert a tree object to one of several standard tree encodings:
>>> print(tree.pformat_latex_qtree())
\Tree [.s
[.dp [.d the ] [.np dog ] ]
[.vp [.v chased ] [.dp [.d the ] [.np cat ] ] ] ]
There is also a fancy ASCII art representation:
>>> tree.pretty_print()
s
________|_____
| vp
| _____|___
dp | dp
___|___ | ___|___
d np v d np
| | | | |
the dog chased the cat
>>> tree.pretty_print(unicodelines=True, nodedist=4)
s
┌──────────────┴────────┐
│ vp
│ ┌────────┴──────┐
dp │ dp
┌──────┴──────┐ │ ┌──────┴──────┐
d np v d np
│ │ │ │ │
the dog chased the cat
Trees can be initialized from treebank strings:
>>> tree2 = Tree.fromstring('(S (NP I) (VP (V enjoyed) (NP my cookie)))')
>>> print(tree2)
(S (NP I) (VP (V enjoyed) (NP my cookie)))
Trees can be compared for equality:
>>> tree == Tree.fromstring(str(tree))
True
>>> tree2 == Tree.fromstring(str(tree2))
True
>>> tree == tree2
False
>>> tree == Tree.fromstring(str(tree2))
False
>>> tree2 == Tree.fromstring(str(tree))
False
>>> tree != Tree.fromstring(str(tree))
False
>>> tree2 != Tree.fromstring(str(tree2))
False
>>> tree != tree2
True
>>> tree != Tree.fromstring(str(tree2))
True
>>> tree2 != Tree.fromstring(str(tree))
True
>>> tree < tree2 or tree > tree2
True
Tree Parsing¶
The class method Tree.fromstring() can be used to parse trees, and it provides some additional options.
>>> tree = Tree.fromstring('(S (NP I) (VP (V enjoyed) (NP my cookie)))')
>>> print(tree)
(S (NP I) (VP (V enjoyed) (NP my cookie)))
When called on a subclass of Tree, it will create trees of that type:
>>> tree = ImmutableTree.fromstring('(VP (V enjoyed) (NP my cookie))')
>>> print(tree)
(VP (V enjoyed) (NP my cookie))
>>> print(type(tree))
<class 'nltk.tree.immutable.ImmutableTree'>
>>> tree[1] = 'x'
Traceback (most recent call last):
. . .
ValueError: ImmutableTree may not be modified
>>> del tree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableTree may not be modified
The brackets
parameter can be used to specify two characters that
should be used as brackets:
>>> print(Tree.fromstring('[S [NP I] [VP [V enjoyed] [NP my cookie]]]',
... brackets='[]'))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
>>> print(Tree.fromstring('<S <NP I> <VP <V enjoyed> <NP my cookie>>>',
... brackets='<>'))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
If brackets
is not a string, or is not exactly two characters,
then Tree.fromstring raises an exception:
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets='')
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets='<<>>')
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets=12)
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<<NP my cookie>>', brackets=('<<','>>'))
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
(We may add support for multi-character brackets in the future, in
which case the brackets=('<<','>>')
example would start working.)
Whitespace brackets are not permitted:
>>> Tree.fromstring('(NP my cookie\n', brackets='(\n')
Traceback (most recent call last):
. . .
TypeError: whitespace brackets not allowed
If an invalid tree is given to Tree.fromstring, then it raises a ValueError, with a description of the problem:
>>> Tree.fromstring('(NP my cookie) (NP my milk)')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected 'end-of-string' but got '(NP'
at index 15.
"...y cookie) (NP my mil..."
^
>>> Tree.fromstring(')NP my cookie(')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got ')'
at index 0.
")NP my coo..."
^
>>> Tree.fromstring('(NP my cookie))')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected 'end-of-string' but got ')'
at index 14.
"...my cookie))"
^
>>> Tree.fromstring('my cookie)')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got 'my'
at index 0.
"my cookie)"
^
>>> Tree.fromstring('(NP my cookie')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected ')' but got 'end-of-string'
at index 13.
"... my cookie"
^
>>> Tree.fromstring('')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got 'end-of-string'
at index 0.
""
^
Trees with no children are supported:
>>> print(Tree.fromstring('(S)'))
(S )
>>> print(Tree.fromstring('(X (Y) (Z))'))
(X (Y ) (Z ))
Trees with an empty node label and no children are supported:
>>> print(Tree.fromstring('()'))
( )
>>> print(Tree.fromstring('(X () ())'))
(X ( ) ( ))
Trees with an empty node label and children are supported, but only if the first child is not a leaf (otherwise, it will be treated as the node label).
>>> print(Tree.fromstring('((A) (B) (C))'))
( (A ) (B ) (C ))
>>> print(Tree.fromstring('((A) leaf)'))
( (A ) leaf)
>>> print(Tree.fromstring('(((())))'))
( ( ( ( ))))
The optional arguments read_node and read_leaf may be used to transform the string values of nodes or leaves.
>>> print(Tree.fromstring('(A b (C d e) (F (G h i)))',
... read_node=lambda s: '<%s>' % s,
... read_leaf=lambda s: '"%s"' % s))
(<A> "b" (<C> "d" "e") (<F> (<G> "h" "i")))
These transformation functions are typically used when the node or leaf labels should be parsed to a non-string value (such as a feature structure). If node and leaf labels need to be able to include whitespace, then you must also use the optional node_pattern and leaf_pattern arguments.
>>> from nltk.featstruct import FeatStruct
>>> tree = Tree.fromstring('([cat=NP] [lex=the] [lex=dog])',
... read_node=FeatStruct, read_leaf=FeatStruct)
>>> tree.set_label(tree.label().unify(FeatStruct('[num=singular]')))
>>> print(tree)
([cat='NP', num='singular'] [lex='the'] [lex='dog'])
The optional argument remove_empty_top_bracketing
can be used to
remove any top-level empty bracketing that occurs.
>>> print(Tree.fromstring('((S (NP I) (VP (V enjoyed) (NP my cookie))))',
... remove_empty_top_bracketing=True))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
It will not remove a top-level empty bracketing with multiple children:
>>> print(Tree.fromstring('((A a) (B b))'))
( (A a) (B b))
Tree.fromlist()¶
The class method Tree.fromlist() can be used to parse trees that are expressed as nested lists, such as those produced by the tree() function from the wordnet module.
>>> from nltk.corpus import wordnet as wn
>>> t=Tree.fromlist(wn.synset('dog.n.01').tree(lambda s:s.hypernyms()))
>>> print(t.height())
14
>>> print(t.leaves())
["Synset('entity.n.01')", "Synset('entity.n.01')"]
>>> t.pretty_print()
Synset('dog.n.01')
_________________|__________________
Synset('canine.n. |
02') |
| |
Synset('carnivor |
e.n.01') |
| |
Synset('placenta |
l.n.01') |
| |
Synset('mammal.n. |
01') |
| |
Synset('vertebra |
te.n.01') |
| |
Synset('chordate. Synset('domestic
n.01') _animal.n.01')
| |
Synset('animal.n. Synset('animal.n.
01') 01')
| |
Synset('organism. Synset('organism.
n.01') n.01')
| |
Synset('living_t Synset('living_t
hing.n.01') hing.n.01')
| |
Synset('whole.n. Synset('whole.n.
02') 02')
| |
Synset('object.n. Synset('object.n.
01') 01')
| |
Synset('physical Synset('physical
_entity.n.01') _entity.n.01')
| |
Synset('entity.n. Synset('entity.n.
01') 01')
Parented Trees¶
ParentedTree is a subclass of Tree that automatically maintains parent pointers for single-parented trees. Parented trees can be created directly from a node label and a list of children:
>>> ptree = (
... ParentedTree('VP', [
... ParentedTree('VERB', ['saw']),
... ParentedTree('NP', [
... ParentedTree('DET', ['the']),
... ParentedTree('NOUN', ['dog'])])]))
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parented trees can be created from strings using the classmethod ParentedTree.fromstring:
>>> ptree = ParentedTree.fromstring('(VP (VERB saw) (NP (DET the) (NOUN dog)))')
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
>>> print(type(ptree))
<class 'nltk.tree.parented.ParentedTree'>
Parented trees can also be created by using the classmethod ParentedTree.convert to convert another type of tree to a parented tree:
>>> tree = Tree.fromstring('(VP (VERB saw) (NP (DET the) (NOUN dog)))')
>>> ptree = ParentedTree.convert(tree)
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
>>> print(type(ptree))
<class 'nltk.tree.parented.ParentedTree'>
ParentedTrees should never be used in the same tree as Trees or MultiParentedTrees. Mixing tree implementations may result in incorrect parent pointers and in TypeError exceptions:
>>> # Inserting a Tree in a ParentedTree gives an exception:
>>> ParentedTree('NP', [
... Tree('DET', ['the']), Tree('NOUN', ['dog'])])
Traceback (most recent call last):
. . .
TypeError: Can not insert a non-ParentedTree into a ParentedTree
>>> # inserting a ParentedTree in a Tree gives incorrect parent pointers:
>>> broken_tree = Tree('NP', [
... ParentedTree('DET', ['the']), ParentedTree('NOUN', ['dog'])])
>>> print(broken_tree[0].parent())
None
Parented Tree Methods¶
In addition to all the methods defined by the Tree class, the ParentedTree class adds six new methods whose values are automatically updated whenever a parented tree is modified: parent(), parent_index(), left_sibling(), right_sibling(), root(), and treeposition().
The parent() method contains a ParentedTree‘s parent, if it has
one; and None
otherwise. ParentedTrees that do not have
parents are known as “root trees.”
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Parent = %s' % subtree.parent())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parent = None
(VERB saw)
Parent = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NP (DET the) (NOUN dog))
Parent = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(DET the)
Parent = (NP (DET the) (NOUN dog))
(NOUN dog)
Parent = (NP (DET the) (NOUN dog))
The parent_index() method stores the index of a tree in its parent’s
child list. If a tree does not have a parent, then its parent_index
is None
.
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Parent Index = %s' % subtree.parent_index())
... assert (subtree.parent() is None or
... subtree.parent()[subtree.parent_index()] is subtree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parent Index = None
(VERB saw)
Parent Index = 0
(NP (DET the) (NOUN dog))
Parent Index = 1
(DET the)
Parent Index = 0
(NOUN dog)
Parent Index = 1
Note that ptree.parent().index(ptree)
is not equivalent to
ptree.parent_index()
. In particular, ptree.parent().index(ptree)
will return the index of the first child of ptree.parent()
that is
equal to ptree
(using ==
); and that child may not be
ptree
:
>>> on_and_on = ParentedTree('CONJP', [
... ParentedTree('PREP', ['on']),
... ParentedTree('COJN', ['and']),
... ParentedTree('PREP', ['on'])])
>>> second_on = on_and_on[2]
>>> print(second_on.parent_index())
2
>>> print(second_on.parent().index(second_on))
0
The methods left_sibling() and right_sibling() can be used to get a
parented tree’s siblings. If a tree does not have a left or right
sibling, then the corresponding method’s value is None
:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Left Sibling = %s' % subtree.left_sibling())
... print(' Right Sibling = %s' % subtree.right_sibling())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Left Sibling = None
Right Sibling = None
(VERB saw)
Left Sibling = None
Right Sibling = (NP (DET the) (NOUN dog))
(NP (DET the) (NOUN dog))
Left Sibling = (VERB saw)
Right Sibling = None
(DET the)
Left Sibling = None
Right Sibling = (NOUN dog)
(NOUN dog)
Left Sibling = (DET the)
Right Sibling = None
A parented tree’s root tree can be accessed using the root() method. This method follows the tree’s parent pointers until it finds a tree without a parent. If a tree does not have a parent, then it is its own root:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Root = %s' % subtree.root())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(VERB saw)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NP (DET the) (NOUN dog))
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(DET the)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NOUN dog)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
The treeposition() method can be used to find a tree’s treeposition relative to its root:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Tree Position = %s' % (subtree.treeposition(),))
... assert subtree.root()[subtree.treeposition()] is subtree
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Tree Position = ()
(VERB saw)
Tree Position = (0,)
(NP (DET the) (NOUN dog))
Tree Position = (1,)
(DET the)
Tree Position = (1, 0)
(NOUN dog)
Tree Position = (1, 1)
Whenever a parented tree is modified, all of the methods described
above (parent(), parent_index(), left_sibling(), right_sibling(),
root(), and treeposition()) are automatically updated. For example,
if we replace ptree
‘s subtree for the word “dog” with a new
subtree for “cat,” the method values for both the “dog” subtree and the
“cat” subtree get automatically updated:
>>> # Replace the dog with a cat
>>> dog = ptree[1,1]
>>> cat = ParentedTree('NOUN', ['cat'])
>>> ptree[1,1] = cat
>>> # the noun phrase is no longer the dog's parent:
>>> print(dog.parent(), dog.parent_index(), dog.left_sibling())
None None None
>>> # dog is now its own root.
>>> print(dog.root())
(NOUN dog)
>>> print(dog.treeposition())
()
>>> # the cat's parent is now the noun phrase:
>>> print(cat.parent())
(NP (DET the) (NOUN cat))
>>> print(cat.parent_index())
1
>>> print(cat.left_sibling())
(DET the)
>>> print(cat.root())
(VP (VERB saw) (NP (DET the) (NOUN cat)))
>>> print(cat.treeposition())
(1, 1)
ParentedTree Regression Tests¶
Keep track of all trees that we create (including subtrees) using this variable:
>>> all_ptrees = []
Define a helper function to create new parented trees:
>>> def make_ptree(s):
... ptree = ParentedTree.convert(Tree.fromstring(s))
... all_ptrees.extend(t for t in ptree.subtrees()
... if isinstance(t, Tree))
... return ptree
Define a test function that examines every subtree in all_ptrees; and checks that all six of its methods are defined correctly. If any ptrees are passed as arguments, then they are printed.
>>> def pcheck(*print_ptrees):
... for ptree in all_ptrees:
... # Check ptree's methods.
... if ptree.parent() is not None:
... i = ptree.parent_index()
... assert ptree.parent()[i] is ptree
... if i > 0:
... assert ptree.left_sibling() is ptree.parent()[i-1]
... if i < (len(ptree.parent())-1):
... assert ptree.right_sibling() is ptree.parent()[i+1]
... assert len(ptree.treeposition()) > 0
... assert (ptree.treeposition() ==
... ptree.parent().treeposition() + (ptree.parent_index(),))
... assert ptree.root() is not ptree
... assert ptree.root() is not None
... assert ptree.root() is ptree.parent().root()
... assert ptree.root()[ptree.treeposition()] is ptree
... else:
... assert ptree.parent_index() is None
... assert ptree.left_sibling() is None
... assert ptree.right_sibling() is None
... assert ptree.root() is ptree
... assert ptree.treeposition() == ()
... # Check ptree's children's methods:
... for i, child in enumerate(ptree):
... if isinstance(child, Tree):
... # pcheck parent() & parent_index() methods
... assert child.parent() is ptree
... assert child.parent_index() == i
... # pcheck sibling methods
... if i == 0:
... assert child.left_sibling() is None
... else:
... assert child.left_sibling() is ptree[i-1]
... if i == len(ptree)-1:
... assert child.right_sibling() is None
... else:
... assert child.right_sibling() is ptree[i+1]
... if print_ptrees:
... print('ok!', end=' ')
... for ptree in print_ptrees: print(ptree)
... else:
... print('ok!')
Run our test function on a variety of newly-created trees:
>>> pcheck(make_ptree('(A)'))
ok! (A )
>>> pcheck(make_ptree('(A (B (C (D) (E f)) g) h)'))
ok! (A (B (C (D ) (E f)) g) h)
>>> pcheck(make_ptree('(A (B) (C c) (D d d) (E e e e))'))
ok! (A (B ) (C c) (D d d) (E e e e))
>>> pcheck(make_ptree('(A (B) (C (c)) (D (d) (d)) (E (e) (e) (e)))'))
ok! (A (B ) (C (c )) (D (d ) (d )) (E (e ) (e ) (e )))
Run our test function after performing various tree-modification operations:
__delitem__()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = ptree[0,0,1]
>>> del ptree[0,0,1]; pcheck(ptree); pcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> del ptree[0,0,0]; pcheck(ptree)
ok! (A (B (C (Q p)) g) h)
>>> del ptree[0,1]; pcheck(ptree)
ok! (A (B (C (Q p))) h)
>>> del ptree[-1]; pcheck(ptree)
ok! (A (B (C (Q p))))
>>> del ptree[-100]
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> del ptree[()]
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be deleted.
>>> # With slices:
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = ptree[0]
>>> del ptree[0:0]; pcheck(ptree)
ok! (A (B c) (D e) f g (H i) j (K l))
>>> del ptree[:1]; pcheck(ptree); pcheck(b)
ok! (A (D e) f g (H i) j (K l))
ok! (B c)
>>> del ptree[-2:]; pcheck(ptree)
ok! (A (D e) f g (H i))
>>> del ptree[1:3]; pcheck(ptree)
ok! (A (D e) (H i))
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del ptree[5:1000]; pcheck(ptree)
ok! (A (B c) (D e) f g (H i))
>>> del ptree[-2:1000]; pcheck(ptree)
ok! (A (B c) (D e) f)
>>> del ptree[-100:1]; pcheck(ptree)
ok! (A (D e) f)
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del ptree[1:-2:2]; pcheck(ptree)
ok! (A (B c) f (H i) j (K l))
__setitem__()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> d, e, q = ptree[0,0]
>>> ptree[0,0,0] = 'x'; pcheck(ptree); pcheck(d)
ok! (A (B (C x (E f) (Q p)) g) h)
ok! (D )
>>> ptree[0,0,1] = make_ptree('(X (Y z))'); pcheck(ptree); pcheck(e)
ok! (A (B (C x (X (Y z)) (Q p)) g) h)
ok! (E f)
>>> ptree[1] = d; pcheck(ptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) (D ))
>>> ptree[-1] = 'x'; pcheck(ptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) x)
>>> ptree[-100] = 'y'
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> ptree[()] = make_ptree('(X y)')
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be assigned to.
>>> # With slices:
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = ptree[0]
>>> ptree[0:0] = ('x', make_ptree('(Y)')); pcheck(ptree)
ok! (A x (Y ) (B c) (D e) f g (H i) j (K l))
>>> ptree[2:6] = (); pcheck(ptree); pcheck(b)
ok! (A x (Y ) (H i) j (K l))
ok! (B c)
>>> ptree[-2:] = ('z', 'p'); pcheck(ptree)
ok! (A x (Y ) (H i) z p)
>>> ptree[1:3] = [make_ptree('(X)') for x in range(10)]; pcheck(ptree)
ok! (A x (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) z p)
>>> ptree[5:1000] = []; pcheck(ptree)
ok! (A x (X ) (X ) (X ) (X ))
>>> ptree[-2:1000] = ['n']; pcheck(ptree)
ok! (A x (X ) (X ) n)
>>> ptree[-100:1] = [make_ptree('(U v)')]; pcheck(ptree)
ok! (A (U v) (X ) (X ) n)
>>> ptree[-1:] = (make_ptree('(X)') for x in range(3)); pcheck(ptree)
ok! (A (U v) (X ) (X ) (X ) (X ) (X ))
>>> ptree[1:-2:2] = ['x', 'y']; pcheck(ptree)
ok! (A (U v) x (X ) y (X ) (X ))
append()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.append('x'); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x)
>>> ptree.append(make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x (X (Y z)))
extend()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.extend(['x', 'y', make_ptree('(X (Y z))')]); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> ptree.extend([]); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> ptree.extend(make_ptree('(X)') for x in range(3)); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)) (X ) (X ) (X ))
insert()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.insert(0, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) h)
>>> ptree.insert(-1, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> ptree.insert(-4, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> # Note: as with ``list``, inserting at a negative index that
>>> # gives a position before the start of the list does *not*
>>> # raise an IndexError exception; it just inserts at 0.
>>> ptree.insert(-400, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A
(X (Y z))
(X (Y z))
(X (Y z))
(B (C (D ) (E f) (Q p)) g)
(X (Y z))
h)
pop()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree[0,0].pop(1); pcheck(ptree)
ParentedTree('E', ['f'])
ok! (A (B (C (D ) (Q p)) g) h)
>>> ptree[0].pop(-1); pcheck(ptree)
'g'
ok! (A (B (C (D ) (Q p))) h)
>>> ptree.pop(); pcheck(ptree)
'h'
ok! (A (B (C (D ) (Q p))))
>>> ptree.pop(-100)
Traceback (most recent call last):
. . .
IndexError: index out of range
remove()
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = ptree[0,0,1]
>>> ptree[0,0].remove(ptree[0,0,1]); pcheck(ptree); pcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> ptree[0,0].remove(make_ptree('(Q p)')); pcheck(ptree)
ok! (A (B (C (D )) g) h)
>>> ptree[0,0].remove(make_ptree('(Q p)'))
Traceback (most recent call last):
. . .
ValueError: ParentedTree('Q', ['p']) is not in list
>>> ptree.remove('h'); pcheck(ptree)
ok! (A (B (C (D )) g))
>>> ptree.remove('h');
Traceback (most recent call last):
. . .
ValueError: 'h' is not in list
>>> # remove() removes the first subtree that is equal (==) to the
>>> # given tree, which may not be the identical tree we give it:
>>> ptree = make_ptree('(A (X x) (Y y) (X x))')
>>> x1, y, x2 = ptree
>>> ptree.remove(ptree[-1]); pcheck(ptree)
ok! (A (Y y) (X x))
>>> print(x1.parent()); pcheck(x1)
None
ok! (X x)
>>> print(x2.parent())
(A (Y y) (X x))
Test that a tree can not be given multiple parents:
>>> ptree = make_ptree('(A (X x) (Y y) (Z z))')
>>> ptree[0] = ptree[1]
Traceback (most recent call last):
. . .
ValueError: Can not insert a subtree that already has a parent.
>>> pcheck()
ok!
[more to be written]
Shallow copying can be tricky for Tree and several of its subclasses. For shallow copies of Tree, only the root node is reconstructed, while all the children are shared between the two trees. Modify the children of one tree - and the shallowly copied tree will also update.
>>> from nltk.tree import Tree, ParentedTree, MultiParentedTree
>>> tree = Tree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> copy_tree = tree.copy(deep=False)
>>> tree == copy_tree # Ensure identical labels and nodes
True
>>> id(copy_tree[0]) == id(tree[0]) # Ensure shallow copy - the children are the same objects in memory
True
For ParentedTree objects, this behaviour is not possible. With a shallow copy, the children of the root node would be reused for both the original and the shallow copy. For this to be possible, some children would need to have multiple parents. As this is forbidden for ParentedTree objects, attempting to make a shallow copy will cause a warning, and a deep copy is made instead.
>>> ptree = ParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> copy_ptree = ptree.copy(deep=False)
>>> copy_ptree == ptree # Ensure identical labels and nodes
True
>>> id(copy_ptree[0]) != id(ptree[0]) # Shallow copying isn't supported - it defaults to deep copy.
True
For MultiParentedTree objects, the issue of only allowing one parent that can be seen for ParentedTree objects is no more. Shallow copying a MultiParentedTree gives the children of the root node two parents: the original and the newly copied root.
>>> mptree = MultiParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> len(mptree[0].parents())
1
>>> copy_mptree = mptree.copy(deep=False)
>>> copy_mptree == mptree # Ensure identical labels and nodes
True
>>> len(mptree[0].parents())
2
>>> len(copy_mptree[0].parents())
2
Shallow copying a MultiParentedTree is similar to creating a second root which is identically labeled as the root on which the copy method was called.
ImmutableParentedTree Regression Tests¶
>>> iptree = ImmutableParentedTree.convert(ptree)
>>> type(iptree)
<class 'nltk.tree.immutable.ImmutableParentedTree'>
>>> del iptree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableParentedTree may not be modified
>>> iptree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableParentedTree may not be modified
MultiParentedTree Regression Tests¶
Keep track of all trees that we create (including subtrees) using this variable:
>>> all_mptrees = []
Define a helper function to create new parented trees:
>>> def make_mptree(s):
... mptree = MultiParentedTree.convert(Tree.fromstring(s))
... all_mptrees.extend(t for t in mptree.subtrees()
... if isinstance(t, Tree))
... return mptree
Define a test function that examines every subtree in all_mptrees; and checks that all six of its methods are defined correctly. If any mptrees are passed as arguments, then they are printed.
>>> def mpcheck(*print_mptrees):
... def has(seq, val): # uses identity comparison
... for item in seq:
... if item is val: return True
... return False
... for mptree in all_mptrees:
... # Check mptree's methods.
... if len(mptree.parents()) == 0:
... assert len(mptree.left_siblings()) == 0
... assert len(mptree.right_siblings()) == 0
... assert len(mptree.roots()) == 1
... assert mptree.roots()[0] is mptree
... assert mptree.treepositions(mptree) == [()]
... left_siblings = right_siblings = ()
... roots = {id(mptree): 1}
... else:
... roots = dict((id(r), 0) for r in mptree.roots())
... left_siblings = mptree.left_siblings()
... right_siblings = mptree.right_siblings()
... for parent in mptree.parents():
... for i in mptree.parent_indices(parent):
... assert parent[i] is mptree
... # check left siblings
... if i > 0:
... for j in range(len(left_siblings)):
... if left_siblings[j] is parent[i-1]:
... del left_siblings[j]
... break
... else:
... assert 0, 'sibling not found!'
... # check ight siblings
... if i < (len(parent)-1):
... for j in range(len(right_siblings)):
... if right_siblings[j] is parent[i+1]:
... del right_siblings[j]
... break
... else:
... assert 0, 'sibling not found!'
... # check roots
... for root in parent.roots():
... assert id(root) in roots, 'missing root'
... roots[id(root)] += 1
... # check that we don't have any unexplained values
... assert len(left_siblings)==0, 'unexpected sibling'
... assert len(right_siblings)==0, 'unexpected sibling'
... for v in roots.values(): assert v>0, roots #'unexpected root'
... # check treepositions
... for root in mptree.roots():
... for treepos in mptree.treepositions(root):
... assert root[treepos] is mptree
... # Check mptree's children's methods:
... for i, child in enumerate(mptree):
... if isinstance(child, Tree):
... # mpcheck parent() & parent_index() methods
... assert has(child.parents(), mptree)
... assert i in child.parent_indices(mptree)
... # mpcheck sibling methods
... if i > 0:
... assert has(child.left_siblings(), mptree[i-1])
... if i < len(mptree)-1:
... assert has(child.right_siblings(), mptree[i+1])
... if print_mptrees:
... print('ok!', end=' ')
... for mptree in print_mptrees: print(mptree)
... else:
... print('ok!')
Run our test function on a variety of newly-created trees:
>>> mpcheck(make_mptree('(A)'))
ok! (A )
>>> mpcheck(make_mptree('(A (B (C (D) (E f)) g) h)'))
ok! (A (B (C (D ) (E f)) g) h)
>>> mpcheck(make_mptree('(A (B) (C c) (D d d) (E e e e))'))
ok! (A (B ) (C c) (D d d) (E e e e))
>>> mpcheck(make_mptree('(A (B) (C (c)) (D (d) (d)) (E (e) (e) (e)))'))
ok! (A (B ) (C (c )) (D (d ) (d )) (E (e ) (e ) (e )))
>>> subtree = make_mptree('(A (B (C (D) (E f)) g) h)')
Including some trees that contain multiple parents:
>>> mpcheck(MultiParentedTree('Z', [subtree, subtree]))
ok! (Z (A (B (C (D ) (E f)) g) h) (A (B (C (D ) (E f)) g) h))
Run our test function after performing various tree-modification operations (n.b., these are the same tests that we ran for ParentedTree, above; thus, none of these trees actually uses multiple parents.)
__delitem__()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = mptree[0,0,1]
>>> del mptree[0,0,1]; mpcheck(mptree); mpcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> del mptree[0,0,0]; mpcheck(mptree)
ok! (A (B (C (Q p)) g) h)
>>> del mptree[0,1]; mpcheck(mptree)
ok! (A (B (C (Q p))) h)
>>> del mptree[-1]; mpcheck(mptree)
ok! (A (B (C (Q p))))
>>> del mptree[-100]
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> del mptree[()]
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be deleted.
>>> # With slices:
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = mptree[0]
>>> del mptree[0:0]; mpcheck(mptree)
ok! (A (B c) (D e) f g (H i) j (K l))
>>> del mptree[:1]; mpcheck(mptree); mpcheck(b)
ok! (A (D e) f g (H i) j (K l))
ok! (B c)
>>> del mptree[-2:]; mpcheck(mptree)
ok! (A (D e) f g (H i))
>>> del mptree[1:3]; mpcheck(mptree)
ok! (A (D e) (H i))
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del mptree[5:1000]; mpcheck(mptree)
ok! (A (B c) (D e) f g (H i))
>>> del mptree[-2:1000]; mpcheck(mptree)
ok! (A (B c) (D e) f)
>>> del mptree[-100:1]; mpcheck(mptree)
ok! (A (D e) f)
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del mptree[1:-2:2]; mpcheck(mptree)
ok! (A (B c) f (H i) j (K l))
__setitem__()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> d, e, q = mptree[0,0]
>>> mptree[0,0,0] = 'x'; mpcheck(mptree); mpcheck(d)
ok! (A (B (C x (E f) (Q p)) g) h)
ok! (D )
>>> mptree[0,0,1] = make_mptree('(X (Y z))'); mpcheck(mptree); mpcheck(e)
ok! (A (B (C x (X (Y z)) (Q p)) g) h)
ok! (E f)
>>> mptree[1] = d; mpcheck(mptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) (D ))
>>> mptree[-1] = 'x'; mpcheck(mptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) x)
>>> mptree[-100] = 'y'
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> mptree[()] = make_mptree('(X y)')
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be assigned to.
>>> # With slices:
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = mptree[0]
>>> mptree[0:0] = ('x', make_mptree('(Y)')); mpcheck(mptree)
ok! (A x (Y ) (B c) (D e) f g (H i) j (K l))
>>> mptree[2:6] = (); mpcheck(mptree); mpcheck(b)
ok! (A x (Y ) (H i) j (K l))
ok! (B c)
>>> mptree[-2:] = ('z', 'p'); mpcheck(mptree)
ok! (A x (Y ) (H i) z p)
>>> mptree[1:3] = [make_mptree('(X)') for x in range(10)]; mpcheck(mptree)
ok! (A x (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) z p)
>>> mptree[5:1000] = []; mpcheck(mptree)
ok! (A x (X ) (X ) (X ) (X ))
>>> mptree[-2:1000] = ['n']; mpcheck(mptree)
ok! (A x (X ) (X ) n)
>>> mptree[-100:1] = [make_mptree('(U v)')]; mpcheck(mptree)
ok! (A (U v) (X ) (X ) n)
>>> mptree[-1:] = (make_mptree('(X)') for x in range(3)); mpcheck(mptree)
ok! (A (U v) (X ) (X ) (X ) (X ) (X ))
>>> mptree[1:-2:2] = ['x', 'y']; mpcheck(mptree)
ok! (A (U v) x (X ) y (X ) (X ))
append()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.append('x'); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x)
>>> mptree.append(make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x (X (Y z)))
extend()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.extend(['x', 'y', make_mptree('(X (Y z))')]); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> mptree.extend([]); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> mptree.extend(make_mptree('(X)') for x in range(3)); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)) (X ) (X ) (X ))
insert()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.insert(0, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) h)
>>> mptree.insert(-1, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> mptree.insert(-4, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> # Note: as with ``list``, inserting at a negative index that
>>> # gives a position before the start of the list does *not*
>>> # raise an IndexError exception; it just inserts at 0.
>>> mptree.insert(-400, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A
(X (Y z))
(X (Y z))
(X (Y z))
(B (C (D ) (E f) (Q p)) g)
(X (Y z))
h)
pop()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree[0,0].pop(1); mpcheck(mptree)
MultiParentedTree('E', ['f'])
ok! (A (B (C (D ) (Q p)) g) h)
>>> mptree[0].pop(-1); mpcheck(mptree)
'g'
ok! (A (B (C (D ) (Q p))) h)
>>> mptree.pop(); mpcheck(mptree)
'h'
ok! (A (B (C (D ) (Q p))))
>>> mptree.pop(-100)
Traceback (most recent call last):
. . .
IndexError: index out of range
remove()
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = mptree[0,0,1]
>>> mptree[0,0].remove(mptree[0,0,1]); mpcheck(mptree); mpcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> mptree[0,0].remove(make_mptree('(Q p)')); mpcheck(mptree)
ok! (A (B (C (D )) g) h)
>>> mptree[0,0].remove(make_mptree('(Q p)'))
Traceback (most recent call last):
. . .
ValueError: MultiParentedTree('Q', ['p']) is not in list
>>> mptree.remove('h'); mpcheck(mptree)
ok! (A (B (C (D )) g))
>>> mptree.remove('h');
Traceback (most recent call last):
. . .
ValueError: 'h' is not in list
>>> # remove() removes the first subtree that is equal (==) to the
>>> # given tree, which may not be the identical tree we give it:
>>> mptree = make_mptree('(A (X x) (Y y) (X x))')
>>> x1, y, x2 = mptree
>>> mptree.remove(mptree[-1]); mpcheck(mptree)
ok! (A (Y y) (X x))
>>> print([str(p) for p in x1.parents()])
[]
>>> print([str(p) for p in x2.parents()])
['(A (Y y) (X x))']
ImmutableMultiParentedTree Regression Tests¶
>>> imptree = ImmutableMultiParentedTree.convert(mptree)
>>> type(imptree)
<class 'nltk.tree.immutable.ImmutableMultiParentedTree'>
>>> del imptree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableMultiParentedTree may not be modified
>>> imptree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableMultiParentedTree may not be modified
ProbabilisticTree Regression Tests¶
>>> prtree = ProbabilisticTree("S", [ProbabilisticTree("NP", ["N"], prob=0.3)], prob=0.6)
>>> print(prtree)
(S (NP N)) (p=0.6)
>>> import copy
>>> prtree == copy.deepcopy(prtree) == prtree.copy(deep=True) == prtree.copy()
True
>>> prtree[0] is prtree.copy()[0]
True
>>> prtree[0] is prtree.copy(deep=True)[0]
False
>>> imprtree = ImmutableProbabilisticTree.convert(prtree)
>>> type(imprtree)
<class 'nltk.tree.immutable.ImmutableProbabilisticTree'>
>>> del imprtree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableProbabilisticTree may not be modified
>>> imprtree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableProbabilisticTree may not be modified
Squashed Bugs¶
This used to discard the (B b)
subtree (fixed in svn 6270):
>>> print(Tree.fromstring('((A a) (B b))'))
( (A a) (B b))
Pickling ParentedTree instances didn’t work for Python 3.7 onwards (See #2478)
>>> import pickle
>>> tree = ParentedTree.fromstring('(S (NN x) (NP x) (NN x))')
>>> print(tree)
(S (NN x) (NP x) (NN x))
>>> pickled = pickle.dumps(tree)
>>> tree_loaded = pickle.loads(pickled)
>>> print(tree_loaded)
(S (NN x) (NP x) (NN x))
ParentedTree used to be impossible to (deep)copy. (See #1324)
>>> from nltk.tree import ParentedTree
>>> import copy
>>> tree = ParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> tree == copy.deepcopy(tree) == copy.copy(tree) == tree.copy(deep=True) == tree.copy()
True