nltk.parse.corenlp module¶
- class nltk.parse.corenlp.CoreNLPDependencyParser[source]¶
Bases:
GenericCoreNLPParser
Dependency parser.
Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)
The recommended usage of CoreNLPParser is using the context manager notation: >>> with CoreNLPServer() as server: … dep_parser = CoreNLPDependencyParser(url=server.url) … parse, = dep_parser.raw_parse( … ‘The quick brown fox jumps over the lazy dog.’ … ) … print(parse.to_conll(4)) # doctest: +NORMALIZE_WHITESPACE The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 obl . . 5 punct
Alternatively, the server can be started using the following notation. Note that CoreNLPServer does not need to be used if the CoreNLP server is started outside of Python. >>> server = CoreNLPServer() >>> server.start() >>> dep_parser = CoreNLPDependencyParser(url=server.url) >>> parse, = dep_parser.raw_parse(‘The quick brown fox jumps over the lazy dog.’) >>> print(parse.tree()) # doctest: +NORMALIZE_WHITESPACE (jumps (fox The quick brown) (dog over the lazy) .)
>>> for governor, dep, dependent in parse.triples(): ... print(governor, dep, dependent) ('jumps', 'VBZ') nsubj ('fox', 'NN') ('fox', 'NN') det ('The', 'DT') ('fox', 'NN') amod ('quick', 'JJ') ('fox', 'NN') amod ('brown', 'JJ') ('jumps', 'VBZ') obl ('dog', 'NN') ('dog', 'NN') case ('over', 'IN') ('dog', 'NN') det ('the', 'DT') ('dog', 'NN') amod ('lazy', 'JJ') ('jumps', 'VBZ') punct ('.', '.')
>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents( ... [ ... 'The quick brown fox jumps over the lazy dog.', ... 'The quick grey wolf jumps over the lazy fox.', ... ] ... ) >>> print(parse_fox.to_conll(4)) The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 obl . . 5 punct
>>> print(parse_dog.to_conll(4)) The DT 4 det quick JJ 4 amod grey JJ 4 amod wolf NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod fox NN 5 obl . . 5 punct
>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents( ... [ ... "I 'm a dog".split(), ... "This is my friends ' cat ( the tabby )".split(), ... ] ... ) >>> print(parse_dog.to_conll(4)) I PRP 4 nsubj 'm VBP 4 cop a DT 4 det dog NN 0 ROOT
>>> print(parse_friends.to_conll(4)) This DT 6 nsubj is VBZ 6 cop my PRP$ 4 nmod:poss friends NNS 6 nmod:poss ' POS 4 case cat NN 0 ROOT ( -LRB- 9 punct the DT 9 det tabby NN 6 dep ) -RRB- 9 punct
>>> parse_john, parse_mary, = dep_parser.parse_text( ... 'John loves Mary. Mary walks.' ... )
>>> print(parse_john.to_conll(4)) John NNP 2 nsubj loves VBZ 0 ROOT Mary NNP 2 obj . . 2 punct
>>> print(parse_mary.to_conll(4)) Mary NNP 2 nsubj walks VBZ 0 ROOT . . 2 punct
Special cases
Non-breaking space inside of a token.
>>> len( ... next( ... dep_parser.raw_parse( ... 'Anhalt said children typically treat a 20-ounce soda bottle as one ' ... 'serving, while it actually contains 2 1/2 servings.' ... ) ... ).nodes ... ) 23
Phone numbers.
>>> len( ... next( ... dep_parser.raw_parse('This is not going to crash: 01 111 555.') ... ).nodes ... ) 10
>>> print( ... next( ... dep_parser.raw_parse('The underscore _ should not simply disappear.') ... ).to_conll(4) ... ) The DT 2 det underscore NN 7 nsubj _ NFP 7 punct should MD 7 aux not RB 7 advmod simply RB 7 advmod disappear VB 0 ROOT . . 7 punct
>>> print( ... next( ... dep_parser.raw_parse( ... 'for all of its insights into the dream world of teen life , and its electronic expression through ' ... 'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 ' ... '1/2-hour running time .' ... ) ... ).to_conll(4) ... ) for IN 2 case all DT 24 obl of IN 5 case its PRP$ 5 nmod:poss insights NNS 2 nmod into IN 9 case the DT 9 det dream NN 9 compound world NN 5 nmod of IN 12 case teen NN 12 compound ...
>>> server.stop()
- parser_annotator = 'depparse'¶
- class nltk.parse.corenlp.CoreNLPParser[source]¶
Bases:
GenericCoreNLPParser
Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)
The recommended usage of CoreNLPParser is using the context manager notation: >>> with CoreNLPServer() as server: … parser = CoreNLPParser(url=server.url) … next( … parser.raw_parse(‘The quick brown fox jumps over the lazy dog.’) … ).pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT | S
_______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP |
____|__________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick brown fox jumps over the lazy dog .
Alternatively, the server can be started using the following notation. Note that CoreNLPServer does not need to be used if the CoreNLP server is started outside of Python. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url)
>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents( ... [ ... 'The quick brown fox jumps over the lazy dog.', ... 'The quick grey wolf jumps over the lazy fox.', ... ] ... )
>>> parse_fox.pretty_print() ROOT | S _______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP | ____|__________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick brown fox jumps over the lazy dog .
>>> parse_wolf.pretty_print() ROOT | S _______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP | ____|_________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick grey wolf jumps over the lazy fox .
>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents( ... [ ... "I 'm a dog".split(), ... "This is my friends ' cat ( the tabby )".split(), ... ] ... )
>>> parse_dog.pretty_print() ROOT | S _______|____ | VP | ________|___ NP | NP | | ___|___ PRP VBP DT NN | | | | I 'm a dog
>>> parse_friends.pretty_print() ROOT | S ____|___________ | VP | ___________|_____________ | | NP | | _______|________________________ | | NP | | | | | _____|_______ | | | NP | NP | | NP | | | ______|_________ | | ___|____ | DT VBZ PRP$ NNS POS NN -LRB- DT NN -RRB- | | | | | | | | | | This is my friends ' cat -LRB- the tabby -RRB-
>>> parse_john, parse_mary, = parser.parse_text( ... 'John loves Mary. Mary walks.' ... )
>>> parse_john.pretty_print() ROOT | S _____|_____________ | VP | | ____|___ | NP | NP | | | | | NNP VBZ NNP . | | | | John loves Mary .
>>> parse_mary.pretty_print() ROOT | S _____|____ NP VP | | | | NNP VBZ . | | | Mary walks .
Special cases
>>> next( ... parser.raw_parse( ... 'NASIRIYA, Iraq—Iraqi doctors who treated former prisoner of war ' ... 'Jessica Lynch have angrily dismissed claims made in her biography ' ... 'that she was raped by her Iraqi captors.' ... ) ... ).height() 14
>>> next( ... parser.raw_parse( ... "The broader Standard & Poor's 500 Index <.SPX> was 0.46 points lower, or " ... '0.05 percent, at 997.02.' ... ) ... ).height() 11
>>> server.stop()
- parser_annotator = 'parse'¶
- class nltk.parse.corenlp.CoreNLPServer[source]¶
Bases:
object
- __init__(path_to_jar=None, path_to_models_jar=None, verbose=False, java_options=None, corenlp_options=None, port=None)[source]¶
- exception nltk.parse.corenlp.CoreNLPServerError[source]¶
Bases:
OSError
Exceptions associated with the Core NLP server.
- class nltk.parse.corenlp.GenericCoreNLPParser[source]¶
Bases:
ParserI
,TokenizerI
,TaggerI
Interface to the CoreNLP Parser.
- parse_sents(sentences, *args, **kwargs)[source]¶
Parse multiple sentences.
Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance’s tagger.
If a whitespace exists inside a token, then the token will be treated as several tokens.
- Parameters
sentences (list(list(str))) – Input sentences to parse
- Return type
iter(iter(Tree))
- parse_text(text, *args, **kwargs)[source]¶
Parse a piece of text.
The text might contain several sentences which will be split by CoreNLP.
- Parameters
text (str) – text to be split.
- Returns
an iterable of syntactic structures. # TODO: should it be an iterable of iterables?
- raw_parse(sentence, properties=None, *args, **kwargs)[source]¶
Parse a sentence.
Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.
- Parameters
sentence (str) – Input sentence to parse
- Return type
iter(Tree)
- raw_parse_sents(sentences, verbose=False, properties=None, *args, **kwargs)[source]¶
Parse multiple sentences.
Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.
- Parameters
sentences (list(str)) – Input sentences to parse.
- Return type
iter(iter(Tree))
- raw_tag_sents(sentences)[source]¶
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a string.
- Parameters
sentences (list(str)) – Input sentences to tag
- Return type
list(list(list(tuple(str, str)))
- tag(sentence: str) List[Tuple[str, str]] [source]¶
Tag a list of tokens.
- Return type
list(tuple(str, str))
- Parameters
sentence (str) –
Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)
The CoreNLP server can be started using the following notation, although we recommend the with CoreNLPServer() as server: context manager notation to ensure that the server is always stopped. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url, tagtype=’ner’) >>> tokens = ‘Rami Eid is studying at Stony Brook University in NY’.split() >>> parser.tag(tokens) # doctest: +NORMALIZE_WHITESPACE [(‘Rami’, ‘PERSON’), (‘Eid’, ‘PERSON’), (‘is’, ‘O’), (‘studying’, ‘O’), (‘at’, ‘O’), (‘Stony’, ‘ORGANIZATION’), (‘Brook’, ‘ORGANIZATION’), (‘University’, ‘ORGANIZATION’), (‘in’, ‘O’), (‘NY’, ‘STATE_OR_PROVINCE’)]
>>> parser = CoreNLPParser(url=server.url, tagtype='pos') >>> tokens = "What is the airspeed of an unladen swallow ?".split() >>> parser.tag(tokens) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')] >>> server.stop()
- tag_sents(sentences)[source]¶
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a list of tokens.
- Parameters
sentences (list(list(str))) – Input sentences to tag
- Return type
list(list(tuple(str, str))
- tokenize(text, properties=None)[source]¶
Tokenize a string of text.
Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)
The CoreNLP server can be started using the following notation, although we recommend the with CoreNLPServer() as server: context manager notation to ensure that the server is always stopped. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url)
>>> text = 'Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\nThanks.' >>> list(parser.tokenize(text)) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue." >>> list( ... parser.tokenize( ... 'The colour of the wall is blue.', ... properties={'tokenize.options': 'americanize=true'}, ... ) ... ) ['The', 'colour', 'of', 'the', 'wall', 'is', 'blue', '.'] >>> server.stop()