nltk.lm.preprocessing module¶
- nltk.lm.preprocessing.flatten(iterable, /)¶
Alternative chain() constructor taking a single iterable argument that evaluates lazily.
- nltk.lm.preprocessing.padded_everygram_pipeline(order, text)[source]¶
Default preprocessing for a sequence of sentences.
Creates two iterators:
sentences padded and turned into sequences of nltk.util.everygrams
sentences padded as above and chained together for a flat stream of words
- Parameters
order – Largest ngram length produced by everygrams.
text (Iterable[Iterable[str]]) – Text to iterate over. Expected to be an iterable of sentences.
- Returns
iterator over text as ngrams, iterator over text as vocabulary data