nltk.corpus.reader.knbc module¶
- class nltk.corpus.reader.knbc.KNBCorpusReader[source]¶
Bases:
SyntaxCorpusReader
- This class implements:
__init__
, which specifies the location of the corpus and a method for detecting the sentence blocks in corpus files._read_block
, which reads a block from the input stream._word
, which takes a block and returns a list of list of words._tag
, which takes a block and returns a list of list of tagged words._parse
, which takes a block and returns a list of parsed sentences.
- The structure of tagged words:
tagged_word = (word(str), tags(tuple)) tags = (surface, reading, lemma, pos1, posid1, pos2, posid2, pos3, posid3, others …)
Usage example
>>> from nltk.corpus.util import LazyCorpusLoader >>> knbc = LazyCorpusLoader( ... 'knbc/corpus1', ... KNBCorpusReader, ... r'.*/KN.*', ... encoding='euc-jp', ... )
>>> len(knbc.sents()[0]) 9