nltk.corpus.reader.opinion_lexicon module¶
CorpusReader for the Opinion Lexicon.
Opinion Lexicon information¶
- Authors: Minqing Hu and Bing Liu, 2004.
Department of Computer Science University of Illinois at Chicago
- Contact: Bing Liu, liub@cs.uic.edu
Distributed with permission.
Related papers:
- Minqing Hu and Bing Liu. “Mining and summarizing customer reviews”.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-04), Aug 22-25, 2004, Seattle, Washington, USA.
- Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and
Comparing Opinions on the Web”. Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
- class nltk.corpus.reader.opinion_lexicon.IgnoreReadmeCorpusView[source]¶
Bases:
StreamBackedCorpusView
This CorpusView is used to skip the initial readme block of the corpus.
- __init__(*args, **kwargs)[source]¶
Create a new corpus view, based on the file
fileid
, and read withblock_reader
. See the class documentation for more information.- Parameters
fileid – The path to the file that is read by this corpus view.
fileid
can either be a string or aPathPointer
.startpos – The file position at which the view will start reading. This can be used to skip over preface sections.
encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).
- class nltk.corpus.reader.opinion_lexicon.OpinionLexiconCorpusReader[source]¶
Bases:
WordListCorpusReader
Reader for Liu and Hu opinion lexicon. Blank lines and readme are ignored.
>>> from nltk.corpus import opinion_lexicon >>> opinion_lexicon.words() ['2-faced', '2-faces', 'abnormal', 'abolish', ...]
The OpinionLexiconCorpusReader provides shortcuts to retrieve positive/negative words:
>>> opinion_lexicon.negative() ['2-faced', '2-faces', 'abnormal', 'abolish', ...]
Note that words from words() method are sorted by file id, not alphabetically:
>>> opinion_lexicon.words()[0:10] ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted'] >>> sorted(opinion_lexicon.words())[0:10] ['2-faced', '2-faces', 'a+', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort']
- CorpusView¶
alias of
IgnoreReadmeCorpusView
- negative()[source]¶
Return all negative words in alphabetical order.
- Returns
a list of negative words.
- Return type
list(str)
- positive()[source]¶
Return all positive words in alphabetical order.
- Returns
a list of positive words.
- Return type
list(str)
- words(fileids=None)[source]¶
Return all words in the opinion lexicon. Note that these words are not sorted in alphabetical order.
- Parameters
fileids – a list or regexp specifying the ids of the files whose words have to be returned.
- Returns
the given file(s) as a list of words and punctuation symbols.
- Return type
list(str)