nltk.classify.util module¶
Utility functions and classes for classifiers.
- class nltk.classify.util.CutoffChecker[source]¶
Bases:
object
A helper class that implements cutoff checks based on number of iterations and log likelihood.
Accuracy cutoffs are also implemented, but they’re almost never a good idea to use.
- nltk.classify.util.apply_features(feature_func, toks, labeled=None)[source]¶
Use the
LazyMap
class to construct a lazy list-like object that is analogous tomap(feature_func, toks)
. In particular, iflabeled=False
, then the returned list-like object’s values are equal to:[feature_func(tok) for tok in toks]
If
labeled=True
, then the returned list-like object’s values are equal to:[(feature_func(tok), label) for (tok, label) in toks]
The primary purpose of this function is to avoid the memory overhead involved in storing all the featuresets for every token in a corpus. Instead, these featuresets are constructed lazily, as-needed. The reduction in memory overhead can be especially significant when the underlying list of tokens is itself lazy (as is the case with many corpus readers).
- Parameters
feature_func – The function that will be applied to each token. It should return a featureset – i.e., a dict mapping feature names to feature values.
toks – The list of tokens to which
feature_func
should be applied. Iflabeled=True
, then the list elements will be passed directly tofeature_func()
. Iflabeled=False
, then the list elements should be tuples(tok,label)
, andtok
will be passed tofeature_func()
.labeled – If true, then
toks
contains labeled tokens – i.e., tuples of the form(tok, label)
. (Default: auto-detect based on types.)