nltk.translate.api module¶
- class nltk.translate.api.AlignedSent[source]¶
Bases:
object
Return an aligned sentence object, which encapsulates two sentences along with an
Alignment
between them.Typically used in machine translation to represent a sentence and its translation.
>>> from nltk.translate import AlignedSent, Alignment >>> algnsent = AlignedSent(['klein', 'ist', 'das', 'Haus'], ... ['the', 'house', 'is', 'small'], Alignment.fromstring('0-3 1-2 2-0 3-1')) >>> algnsent.words ['klein', 'ist', 'das', 'Haus'] >>> algnsent.mots ['the', 'house', 'is', 'small'] >>> algnsent.alignment Alignment([(0, 3), (1, 2), (2, 0), (3, 1)]) >>> from nltk.corpus import comtrans >>> print(comtrans.aligned_sents()[54]) <AlignedSent: 'Weshalb also sollten...' -> 'So why should EU arm...'> >>> print(comtrans.aligned_sents()[54].alignment) 0-0 0-1 1-0 2-2 3-4 3-5 4-7 5-8 6-3 7-9 8-9 9-10 9-11 10-12 11-6 12-6 13-13
- Parameters
words (list(str)) – Words in the target language sentence
mots (list(str)) – Words in the source language sentence
alignment (Alignment) – Word-level alignments between
words
andmots
. Each alignment is represented as a 2-tuple (words_index, mots_index).
- property alignment¶
- property mots¶
- property words¶
- class nltk.translate.api.Alignment[source]¶
Bases:
frozenset
A storage class for representing alignment between two sequences, s1, s2. In general, an alignment is a set of tuples of the form (i, j, …) representing an alignment between the i-th element of s1 and the j-th element of s2. Tuples are extensible (they might contain additional data, such as a boolean to indicate sure vs possible alignments).
>>> from nltk.translate import Alignment >>> a = Alignment([(0, 0), (0, 1), (1, 2), (2, 2)]) >>> a.invert() Alignment([(0, 0), (1, 0), (2, 1), (2, 2)]) >>> print(a.invert()) 0-0 1-0 2-1 2-2 >>> a[0] [(0, 1), (0, 0)] >>> a.invert()[2] [(2, 1), (2, 2)] >>> b = Alignment([(0, 0), (0, 1)]) >>> b.issubset(a) True >>> c = Alignment.fromstring('0-0 0-1') >>> b == c True
- classmethod fromstring(s)[source]¶
Read a giza-formatted string and return an Alignment object.
>>> Alignment.fromstring('0-0 2-1 9-2 21-3 10-4 7-5') Alignment([(0, 0), (2, 1), (7, 5), (9, 2), (10, 4), (21, 3)])
- Parameters
s (str) – the positional alignments in giza format
- Return type
- Returns
An Alignment object corresponding to the string representation
s
.
- class nltk.translate.api.PhraseTable[source]¶
Bases:
object
In-memory store of translations for a given phrase, and the log probability of the those translations
- add(src_phrase, trg_phrase, log_prob)[source]¶
- Parameters
log_prob (float) – Log probability that given
src_phrase
,trg_phrase
is its translation
- translations_for(src_phrase)[source]¶
Get the translations for a source language phrase
- Parameters
src_phrase (tuple(str)) – Source language phrase of interest
- Returns
A list of target language phrases that are translations of
src_phrase
, ordered in decreasing order of likelihood. Each list element is a tuple of the target phrase and its log probability.- Return type
list(PhraseTableEntry)