nltk.collections module¶
- class nltk.collections.AbstractLazySequence[source]¶
Bases:
object
An abstract base class for read-only sequences whose values are computed as needed. Lazy sequences act like tuples – they can be indexed, sliced, and iterated over; but they may not be modified.
The most common application of lazy sequences in NLTK is for corpus view objects, which provide access to the contents of a corpus without loading the entire corpus into memory, by loading pieces of the corpus from disk as needed.
The result of modifying a mutable element of a lazy sequence is undefined. In particular, the modifications made to the element may or may not persist, depending on whether and when the lazy sequence caches that element’s value or reconstructs it from scratch.
Subclasses are required to define two methods:
__len__()
anditerate_from()
.
- class nltk.collections.LazyConcatenation[source]¶
Bases:
AbstractLazySequence
A lazy sequence formed by concatenating a list of lists. This underlying list of lists may itself be lazy.
LazyConcatenation
maintains an index that it uses to keep track of the relationship between offsets in the concatenated lists and offsets in the sublists.
- class nltk.collections.LazyEnumerate[source]¶
Bases:
LazyZip
A lazy sequence whose elements are tuples, each containing a count (from zero) and a value yielded by underlying sequence.
LazyEnumerate
is useful for obtaining an indexed list. The tuples are constructed lazily – i.e., when you read a value from the list,LazyEnumerate
will calculate that value by forming a tuple from the count of the i-th element and the i-th element of the underlying sequence.LazyEnumerate
is essentially a lazy version of the Python primitive functionenumerate
. In particular, the following two expressions are equivalent:>>> from nltk.collections import LazyEnumerate >>> sequence = ['first', 'second', 'third'] >>> list(enumerate(sequence)) [(0, 'first'), (1, 'second'), (2, 'third')] >>> list(LazyEnumerate(sequence)) [(0, 'first'), (1, 'second'), (2, 'third')]
Lazy enumerations can be useful for conserving memory in cases where the argument sequences are particularly long.
A typical example of a use case for this class is obtaining an indexed list for a long sequence of values. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.
- class nltk.collections.LazyIteratorList[source]¶
Bases:
AbstractLazySequence
Wraps an iterator, loading its elements on demand and making them subscriptable. __repr__ displays only the first few elements.
- class nltk.collections.LazyMap[source]¶
Bases:
AbstractLazySequence
A lazy sequence whose elements are formed by applying a given function to each element in one or more underlying lists. The function is applied lazily – i.e., when you read a value from the list,
LazyMap
will calculate that value by applying its function to the underlying lists’ value(s).LazyMap
is essentially a lazy version of the Python primitive functionmap
. In particular, the following two expressions are equivalent:>>> from nltk.collections import LazyMap >>> function = str >>> sequence = [1,2,3] >>> map(function, sequence) ['1', '2', '3'] >>> list(LazyMap(function, sequence)) ['1', '2', '3']
Like the Python
map
primitive, if the source lists do not have equal size, then the value None will be supplied for the ‘missing’ elements.Lazy maps can be useful for conserving memory, in cases where individual values take up a lot of space. This is especially true if the underlying list’s values are constructed lazily, as is the case with many corpus readers.
A typical example of a use case for this class is performing feature detection on the tokens in a corpus. Since featuresets are encoded as dictionaries, which can take up a lot of memory, using a
LazyMap
can significantly reduce memory usage when training and running classifiers.
- class nltk.collections.LazySubsequence[source]¶
Bases:
AbstractLazySequence
A subsequence produced by slicing a lazy sequence. This slice keeps a reference to its source sequence, and generates its values by looking them up in the source sequence.
- MIN_SIZE = 100¶
The minimum size for which lazy slices should be created. If
LazySubsequence()
is called with a subsequence that is shorter thanMIN_SIZE
, then a tuple will be returned instead.
- class nltk.collections.LazyZip[source]¶
Bases:
LazyMap
A lazy sequence whose elements are tuples, each containing the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence. The tuples are constructed lazily – i.e., when you read a value from the list,
LazyZip
will calculate that value by forming a tuple from the i-th element of each of the argument sequences.LazyZip
is essentially a lazy version of the Python primitive functionzip
. In particular, an evaluated LazyZip is equivalent to a zip:>>> from nltk.collections import LazyZip >>> sequence1, sequence2 = [1, 2, 3], ['a', 'b', 'c'] >>> zip(sequence1, sequence2) [(1, 'a'), (2, 'b'), (3, 'c')] >>> list(LazyZip(sequence1, sequence2)) [(1, 'a'), (2, 'b'), (3, 'c')] >>> sequences = [sequence1, sequence2, [6,7,8,9]] >>> list(zip(*sequences)) == list(LazyZip(*sequences)) True
Lazy zips can be useful for conserving memory in cases where the argument sequences are particularly long.
A typical example of a use case for this class is combining long sequences of gold standard and predicted values in a classification or tagging task in order to calculate accuracy. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.
- class nltk.collections.OrderedDict[source]¶
Bases:
dict
- popitem()[source]¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, failobj=None)[source]¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- class nltk.collections.Trie[source]¶
Bases:
dict
A Trie implementation for strings
- LEAF = True¶
- __init__(strings=None)[source]¶
Builds a Trie object, which is built around a
dict
If
strings
is provided, it will add thestrings
, which consist of alist
ofstrings
, to the Trie. Otherwise, it’ll construct an empty Trie.- Parameters
strings (list(str)) – List of strings to insert into the trie (Default is
None
)