nltk.corpus.reader.panlex_swadesh module¶
- class nltk.corpus.reader.panlex_swadesh.PanlexLanguage¶
Bases:
tuple
PanlexLanguage(panlex_uid, iso639, iso639_type, script, name, langvar_uid)
- static __new__(_cls, panlex_uid, iso639, iso639_type, script, name, langvar_uid)¶
Create new instance of PanlexLanguage(panlex_uid, iso639, iso639_type, script, name, langvar_uid)
- iso639¶
Alias for field number 1
- iso639_type¶
Alias for field number 2
- langvar_uid¶
Alias for field number 5
- name¶
Alias for field number 4
- panlex_uid¶
Alias for field number 0
- script¶
Alias for field number 3
- class nltk.corpus.reader.panlex_swadesh.PanlexSwadeshCorpusReader[source]¶
Bases:
WordListCorpusReader
This is a class to read the PanLex Swadesh list from
David Kamholz, Jonathan Pool, and Susan M. Colowick (2014). PanLex: Building a Resource for Panlingual Lexical Translation. In LREC. http://www.lrec-conf.org/proceedings/lrec2014/pdf/1029_Paper.pdf
License: CC0 1.0 Universal https://creativecommons.org/publicdomain/zero/1.0/legalcode
- __init__(*args, **kwargs)[source]¶
- Parameters
root (PathPointer or str) – A path pointer identifying the root directory for this corpus. If a string is specified, then it will be converted to a
PathPointer
automatically.fileids – A list of the files that make up this corpus. This list can either be specified explicitly, as a list of strings; or implicitly, as a regular expression over file paths. The absolute path for each file will be constructed by joining the reader’s root to each file name.
encoding –
The default unicode encoding for the files that make up the corpus. The value of
encoding
can be any of the following:A string:
encoding
is the encoding name for all files.A dictionary:
encoding[file_id]
is the encoding name for the file whose identifier isfile_id
. Iffile_id
is not inencoding
, then the file contents will be processed using non-unicode byte strings.A list:
encoding
should be a list of(regexp, encoding)
tuples. The encoding for a file whose identifier isfile_id
will be theencoding
value for the first tuple whoseregexp
matches thefile_id
. If no tuple’sregexp
matches thefile_id
, the file contents will be processed using non-unicode byte strings.None: the file contents of all files will be processed using non-unicode byte strings.
tagset – The name of the tagset used by this corpus, to be used for normalizing or converting the POS tags returned by the
tagged_...()
methods.