nltk.corpus.reader.nombank module¶
- class nltk.corpus.reader.nombank.NombankChainTreePointer[source]¶
Bases:
NombankPointer
- pieces¶
A list of the pieces that make up this chain. Elements may be either
NombankSplitTreePointer
orNombankTreePointer
pointers.
- class nltk.corpus.reader.nombank.NombankCorpusReader[source]¶
Bases:
CorpusReader
Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of “frameset files” which define the argument labels used by the annotations, on a per-noun basis. Each “frameset file” contains one or more predicates, such as
'turn'
or'turn_on'
, each of which is divided into coarse-grained word senses called “rolesets”. For each “roleset”, the frameset file provides descriptions of the argument roles, along with examples.- __init__(root, nomfile, framefiles='', nounsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding='utf8')[source]¶
- Parameters
root – The root directory for this corpus.
nomfile – The name of the file containing the predicate- argument annotations (relative to
root
).framefiles – A list or regexp specifying the frameset fileids for this corpus.
parse_fileid_xform – A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).
parse_corpus – The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by nombank.
- instances(baseform=None)[source]¶
- Returns
a corpus view that acts as a list of
NombankInstance
objects, one for each noun in the corpus.
- lines()[source]¶
- Returns
a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.
- class nltk.corpus.reader.nombank.NombankInstance[source]¶
Bases:
object
- __init__(fileid, sentnum, wordnum, baseform, sensenumber, predicate, predid, arguments, parse_corpus=None)[source]¶
- arguments¶
A list of tuples (argloc, argid), specifying the location and identifier for each of the predicate’s argument in the containing sentence. Argument identifiers are strings such as
'ARG0'
or'ARGM-TMP'
. This list does not contain the predicate.
- baseform¶
The baseform of the predicate.
- fileid¶
The name of the file containing the parse tree for this instance’s sentence.
- parse_corpus¶
A corpus reader for the parse trees corresponding to the instances in this nombank corpus.
- predicate¶
A
NombankTreePointer
indicating the position of this instance’s predicate within its containing sentence.
- predid¶
Identifier of the predicate.
- property roleset¶
The name of the roleset used by this instance’s predicate. Use
nombank.roleset() <NombankCorpusReader.roleset>
to look up information about the roleset.
- sensenumber¶
The sense number of the predicate.
- sentnum¶
The sentence number of this sentence within
fileid
. Indexing starts from zero.
- property tree¶
The parse tree corresponding to this instance, or None if the corresponding tree is not available.
- wordnum¶
The word number of this instance’s predicate within its containing sentence. Word numbers are indexed starting from zero, and include traces and other empty parse elements.
- class nltk.corpus.reader.nombank.NombankPointer[source]¶
Bases:
object
A pointer used by nombank to identify one or more constituents in a parse tree.
NombankPointer
is an abstract base class with three concrete subclasses:NombankTreePointer
is used to point to single constituents.NombankSplitTreePointer
is used to point to ‘split’ constituents, which consist of a sequence of two or moreNombankTreePointer
pointers.NombankChainTreePointer
is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can beNombankTreePointer
orNombankSplitTreePointer
pointers.
- class nltk.corpus.reader.nombank.NombankSplitTreePointer[source]¶
Bases:
NombankPointer
- pieces¶
A list of the pieces that make up this chain. Elements are all
NombankTreePointer
pointers.
- class nltk.corpus.reader.nombank.NombankTreePointer[source]¶
Bases:
NombankPointer
wordnum:height*wordnum:height*… wordnum:height,