nltk.chunk.util module¶
- class nltk.chunk.util.ChunkScore[source]¶
Bases:
object
A utility class for scoring chunk parsers.
ChunkScore
can evaluate a chunk parser’s output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time.Texts are evaluated with the
score
method. The results of evaluation can be accessed via a number of accessor methods, such asprecision
andf_measure
. A typical use of theChunkScore
class is:>>> chunkscore = ChunkScore() >>> for correct in correct_sentences: ... guess = chunkparser.parse(correct.leaves()) ... chunkscore.score(correct, guess) >>> print('F Measure:', chunkscore.f_measure()) F Measure: 0.823
- Variables
kwargs –
Keyword arguments:
max_tp_examples: The maximum number actual examples of true positives to record. This affects the
correct
member function:correct
will not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)max_fp_examples: The maximum number actual examples of false positives to record. This affects the
incorrect
member function and theguessed
member function:incorrect
will not return more than this number of examples, andguessed
will not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)max_fn_examples: The maximum number actual examples of false negatives to record. This affects the
missed
member function and thecorrect
member function:missed
will not return more than this number of examples, andcorrect
will not return more than this number of true negative examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)chunk_label: A regular expression indicating which chunks should be compared. Defaults to
'.*'
(i.e., all chunks).
_tp – List of true positives
_fp – List of false positives
_fn – List of false negatives
_tp_num – Number of true positives
_fp_num – Number of false positives
_fn_num – Number of false negatives.
- accuracy()[source]¶
Return the overall tag-based accuracy for all text that have been scored by this
ChunkScore
, using the IOB (conll2000) tag encoding.- Return type
float
- correct()[source]¶
Return the chunks which were included in the correct chunk structures, listed in input order.
- Return type
list of chunks
- f_measure(alpha=0.5)[source]¶
Return the overall F measure for all texts that have been scored by this
ChunkScore
.- Parameters
alpha (float) – the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value.
alpha
should have a value in the range [0,1].- Return type
float
- guessed()[source]¶
Return the chunks which were included in the guessed chunk structures, listed in input order.
- Return type
list of chunks
- incorrect()[source]¶
Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order.
- Return type
list of chunks
- missed()[source]¶
Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order.
- Return type
list of chunks
- precision()[source]¶
Return the overall precision for all texts that have been scored by this
ChunkScore
.- Return type
float
- nltk.chunk.util.accuracy(chunker, gold)[source]¶
Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.
- Parameters
chunker (ChunkParserI) – The chunker being evaluated.
gold (tree) – The chunk structures to score the chunker on.
- Return type
float
- nltk.chunk.util.conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), root_label='S')[source]¶
Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).
- Parameters
s (str) – The CoNLL string to be converted.
chunk_types (tuple) – The chunk types to be converted.
root_label (str) – The node label to use for the root.
- Return type
- nltk.chunk.util.conlltags2tree(sentence, chunk_types=('NP', 'PP', 'VP'), root_label='S', strict=False)[source]¶
Convert the CoNLL IOB format to a tree.
- nltk.chunk.util.ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE'], root_label='S')[source]¶
Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.
- Return type
- nltk.chunk.util.tagstr2tree(s, chunk_label='NP', root_label='S', sep='/', source_tagset=None, target_tagset=None)[source]¶
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets (
[...]
). Words are delimited by whitespace, and each word should have the formtext/tag
. Words that do not contain a slash are assigned atag
of None.- Parameters
s (str) – The string to be converted
chunk_label (str) – The label to use for chunk nodes
root_label (str) – The label to use for the root of the tree
- Return type