nltk.translate.ibm4 module¶
Translation model that reorders output words based on their type and distance from other related words in the output sentence.
IBM Model 4 improves the distortion model of Model 3, motivated by the observation that certain words tend to be re-ordered in a predictable way relative to one another. For example, <adjective><noun> in English usually has its order flipped as <noun><adjective> in French.
Model 4 requires words in the source and target vocabularies to be categorized into classes. This can be linguistically driven, like parts of speech (adjective, nouns, prepositions, etc). Word classes can also be obtained by statistical methods. The original IBM Model 4 uses an information theoretic approach to group words into 50 classes for each vocabulary.
Terminology¶
- Cept
A source word with non-zero fertility i.e. aligned to one or more target words.
- Tablet
The set of target word(s) aligned to a cept.
- Head of cept
The first word of the tablet of that cept.
- Center of cept
The average position of the words in that cept’s tablet. If the value is not an integer, the ceiling is taken. For example, for a tablet with words in positions 2, 5, 6 in the target sentence, the center of the corresponding cept is ceil((2 + 5 + 6) / 3) = 5
- Displacement
For a head word, defined as (position of head word - position of previous cept’s center). Can be positive or negative. For a non-head word, defined as (position of non-head word - position of previous word in the same tablet). Always positive, because successive words in a tablet are assumed to appear to the right of the previous word.
In contrast to Model 3 which reorders words in a tablet independently of other words, Model 4 distinguishes between three cases.
Words generated by NULL are distributed uniformly.
For a head word t, its position is modeled by the probability d_head(displacement | word_class_s(s),word_class_t(t)), where s is the previous cept, and word_class_s and word_class_t maps s and t to a source and target language word class respectively.
For a non-head word t, its position is modeled by the probability d_non_head(displacement | word_class_t(t))
The EM algorithm used in Model 4 is:
- E step
In the training data, collect counts, weighted by prior probabilities.
count how many times a source language word is translated into a target language word
for a particular word class, count how many times a head word is located at a particular displacement from the previous cept’s center
for a particular word class, count how many times a non-head word is located at a particular displacement from the previous target word
count how many times a source word is aligned to phi number of target words
count how many times NULL is aligned to a target word
- M step
Estimate new probabilities based on the counts from the E step
Like Model 3, there are too many possible alignments to consider. Thus, a hill climbing approach is used to sample good candidates.
Notations¶
- i
Position in the source sentence Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j
Position in the target sentence Valid values are 1, 2, …, length of target sentence
- l
Number of words in the source sentence, excluding NULL
- m
Number of words in the target sentence
- s
A word in the source language
- t
A word in the target language
- phi
Fertility, the number of target words produced by a source word
- p1
Probability that a target word produced by a source word is accompanied by another target word that is aligned to NULL
- p0
1 - p1
- dj
Displacement, Δj
References¶
Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
- class nltk.translate.ibm4.IBMModel4[source]¶
Bases:
IBMModel
Translation model that reorders output words based on their type and their distance from other related words in the output sentence
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize'])) >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 } >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm4.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm4.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm4.translation_table['ja'][None], 3)) 1.0
>>> print(round(ibm4.head_distortion_table[1][0][1], 3)) 1.0 >>> print(round(ibm4.head_distortion_table[2][0][1], 3)) 0.0 >>> print(round(ibm4.non_head_distortion_table[3][6], 3)) 0.5
>>> print(round(ibm4.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm4.fertility_table[1]['book'], 3)) 1.0
>>> print(round(ibm4.p1, 3)) 0.033
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
- __init__(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]¶
Train on
sentence_aligned_corpus
and create a lexical translation model, distortion models, a fertility model, and a model for generating NULL-aligned words.Translation direction is from
AlignedSent.mots
toAlignedSent.words
.- Parameters
sentence_aligned_corpus (list(AlignedSent)) – Sentence-aligned parallel corpus
iterations (int) – Number of iterations to run training algorithm
source_word_classes (dict[str]: int) – Lookup table that maps a source word to its word class, the latter represented by an integer id
target_word_classes (dict[str]: int) – Lookup table that maps a target word to its word class, the latter represented by an integer id
probability_tables (dict[str]: object) – Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present:
translation_table
,alignment_table
,fertility_table
,p1
,head_distortion_table
,non_head_distortion_table
. SeeIBMModel
andIBMModel4
for the type and purpose of these tables.
- prob_t_a_given_s(alignment_info)[source]¶
Probability of target sentence and an alignment given the source sentence