nltk.translate.meteor_score module¶
- nltk.translate.meteor_score.align_words(hypothesis: ~typing.Iterable[str], reference: ~typing.Iterable[str], stemmer: ~nltk.stem.api.StemmerI = <PorterStemmer>, wordnet: ~nltk.corpus.reader.wordnet.WordNetCorpusReader = <WordNetCorpusReader in 'C:\\Users\\Tom\\AppData\\Roaming\\nltk_data\\corpora\\wordnet.zip/wordnet/'>) Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]] [source]¶
Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. In case there are multiple matches the match which has the least number of crossing is chosen.
- Parameters
hypothesis (Iterable[str]) – pre-tokenized hypothesis
reference (Iterable[str]) – pre-tokenized reference
stemmer (StemmerI) – nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet (WordNetCorpusReader) – a wordnet corpus reader object (default nltk.corpus.wordnet)
- Returns
sorted list of matched tuples, unmatched hypothesis list, unmatched reference list
- Return type
Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]]
- nltk.translate.meteor_score.exact_match(hypothesis: Iterable[str], reference: Iterable[str]) Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]] [source]¶
matches exact words in hypothesis and reference and returns a word mapping based on the enumerated word id between hypothesis and reference
- Parameters
hypothesis (Iterable[str]) – pre-tokenized hypothesis
reference (Iterable[str]) – pre-tokenized reference
- Returns
enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples
- Return type
Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]]
- nltk.translate.meteor_score.meteor_score(references: ~typing.Iterable[~typing.Iterable[str]], hypothesis: ~typing.Iterable[str], preprocess: ~typing.Callable[[str], str] = <method 'lower' of 'str' objects>, stemmer: ~nltk.stem.api.StemmerI = <PorterStemmer>, wordnet: ~nltk.corpus.reader.wordnet.WordNetCorpusReader = <WordNetCorpusReader in 'C:\\Users\\Tom\\AppData\\Roaming\\nltk_data\\corpora\\wordnet.zip/wordnet/'>, alpha: float = 0.9, beta: float = 3.0, gamma: float = 0.5) float [source]¶
Calculates METEOR score for hypothesis with multiple references as described in “Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments” by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. https://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf
In case of multiple references the best score is chosen. This method iterates over single_meteor_score and picks the best pair among all the references for a given hypothesis
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', 'forever', 'hearing', 'the', 'activity', 'guidebook', 'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', 'ensures', 'that', 'the', 'military', 'will', 'forever', 'heed', 'Party', 'commands'] >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', 'guarantees', 'the', 'military', 'forces', 'always', 'being', 'under', 'the', 'command', 'of', 'the', 'Party'] >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', 'army', 'always', 'to', 'heed', 'the', 'directions', 'of', 'the', 'party']
>>> round(meteor_score([reference1, reference2, reference3], hypothesis1),4) 0.6944
If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.
>>> round(meteor_score([['this', 'is', 'a', 'cat']], ['non', 'matching', 'hypothesis']),4) 0.0
- Parameters
references (Iterable[Iterable[str]]) – pre-tokenized reference sentences
hypothesis (Iterable[str]) – a pre-tokenized hypothesis sentence
preprocess (Callable[[str], str]) – preprocessing function (default str.lower)
stemmer (StemmerI) – nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet (WordNetCorpusReader) – a wordnet corpus reader object (default nltk.corpus.wordnet)
alpha (float) – parameter for controlling relative weights of precision and recall.
beta (float) – parameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma (float) – relative weight assigned to fragmentation penalty.
- Returns
The sentence-level METEOR score.
- Return type
float
- nltk.translate.meteor_score.single_meteor_score(reference: ~typing.Iterable[str], hypothesis: ~typing.Iterable[str], preprocess: ~typing.Callable[[str], str] = <method 'lower' of 'str' objects>, stemmer: ~nltk.stem.api.StemmerI = <PorterStemmer>, wordnet: ~nltk.corpus.reader.wordnet.WordNetCorpusReader = <WordNetCorpusReader in 'C:\\Users\\Tom\\AppData\\Roaming\\nltk_data\\corpora\\wordnet.zip/wordnet/'>, alpha: float = 0.9, beta: float = 3.0, gamma: float = 0.5) float [source]¶
Calculates METEOR score for single hypothesis and reference as per “Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments” by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. https://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', 'ensures', 'that', 'the', 'military', 'will', 'forever', 'heed', 'Party', 'commands']
>>> round(single_meteor_score(reference1, hypothesis1),4) 0.6944
If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.
>>> round(single_meteor_score(['this', 'is', 'a', 'cat'], ['non', 'matching', 'hypothesis']),4) 0.0
- Parameters
reference (Iterable[str]) – pre-tokenized reference
hypothesis (Iterable[str]) – pre-tokenized hypothesis
preprocess (Callable[[str], str]) – preprocessing function (default str.lower)
stemmer (StemmerI) – nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet (WordNetCorpusReader) – a wordnet corpus reader object (default nltk.corpus.wordnet)
alpha (float) – parameter for controlling relative weights of precision and recall.
beta (float) – parameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma (float) – relative weight assigned to fragmentation penalty.
- Returns
The sentence-level METEOR score.
- Return type
float
- nltk.translate.meteor_score.stem_match(hypothesis: ~typing.Iterable[str], reference: ~typing.Iterable[str], stemmer: ~nltk.stem.api.StemmerI = <PorterStemmer>) Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]] [source]¶
Stems each word and matches them in hypothesis and reference and returns a word mapping between hypothesis and reference
- Parameters
hypothesis (Iterable[str]) – pre-tokenized hypothesis
reference (Iterable[str]) – pre-tokenized reference
stemmer (StemmerI) – nltk.stem.api.StemmerI object (default PorterStemmer())
- Returns
enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples
- Return type
Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]]
- nltk.translate.meteor_score.wordnetsyn_match(hypothesis: ~typing.Iterable[str], reference: ~typing.Iterable[str], wordnet: ~nltk.corpus.reader.wordnet.WordNetCorpusReader = <WordNetCorpusReader in 'C:\\Users\\Tom\\AppData\\Roaming\\nltk_data\\corpora\\wordnet.zip/wordnet/'>) Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]] [source]¶
Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.
- Parameters
hypothesis (Iterable[str]) – pre-tokenized hypothesis
reference (Iterable[str]) – pre-tokenized reference
wordnet (WordNetCorpusReader) – a wordnet corpus reader object (default nltk.corpus.wordnet)
- Returns
list of mapped tuples
- Return type
Tuple[List[Tuple[int, int]], List[Tuple[int, str]], List[Tuple[int, str]]]