nltk.langnames module¶
Translate between language names and language codes.
The iso639-3 language codes were downloaded from the registration authority at https://iso639-3.sil.org/
The iso639-3 codeset is evolving, so retired language codes are kept in the “iso639retired” dictionary, which is used as fallback by the wrapper functions “langname” and “langcode”, in order to support the lookup of retired codes.
The “langcode” function returns the current iso639-3 code if there is one, and falls back to the retired code otherwise. As specified by BCP-47, it returns the shortest (2-letter) code by default, but 3-letter codes are also available:
>>> import nltk.langnames as lgn >>> lgn.langname('fri') #'fri' is a retired code 'Western Frisian'The current code is different from the retired one: >>> lgn.langcode(‘Western Frisian’) ‘fy’
>>> lgn.langcode('Western Frisian', typ = 3) 'fry'
- nltk.langnames.lang2q(name)[source]¶
Convert simple language name to Wikidata Q-code
>>> lang2q('Low German') 'Q25433'
- nltk.langnames.langcode(name, typ=2)[source]¶
Convert language name to iso639-3 language code. Returns the short 2-letter code by default, if one is available, and the 3-letter code otherwise:
>>> from nltk.langnames import langcode >>> langcode('Modern Greek (1453-)') 'el'
Specify ‘typ=3’ to get the 3-letter code:
>>> langcode('Modern Greek (1453-)', typ=3) 'ell'
- nltk.langnames.langname(tag, typ='full')[source]¶
Convert a composite BCP-47 tag to a language name
>>> from nltk.langnames import langname >>> langname('ca-Latn-ES-valencia') 'Catalan: Latin: Spain: Valencian'
>>> langname('ca-Latn-ES-valencia', typ="short") 'Catalan'
- nltk.langnames.q2name(qcode, typ='full')[source]¶
Convert Wikidata Q-code to BCP-47 (full or short) language name
>>> q2name('Q4289225') 'Low German: Mecklenburg-Vorpommern'
>>> q2name('Q4289225', "short") 'Low German'