nltk.corpus.reader.indian module¶
Indian Language POS-Tagged Corpus Collected by A Kumaran, Microsoft Research, India Distributed with permission
- Contents:
Bangla: IIT Kharagpur
Hindi: Microsoft Research India
Marathi: IIT Bombay
Telugu: IIIT Hyderabad
- class nltk.corpus.reader.indian.IndianCorpusReader[source]¶
Bases:
CorpusReader
List of words, one per line. Blank lines are ignored.
- class nltk.corpus.reader.indian.IndianCorpusView[source]¶
Bases:
StreamBackedCorpusView
- __init__(corpus_file, encoding, tagged, group_by_sent, tag_mapping_function=None)[source]¶
Create a new corpus view, based on the file
fileid
, and read withblock_reader
. See the class documentation for more information.- Parameters
fileid – The path to the file that is read by this corpus view.
fileid
can either be a string or aPathPointer
.startpos – The file position at which the view will start reading. This can be used to skip over preface sections.
encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).