Sunday, 4 August 2019

Escape parentheses in NLTK parse tree

In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.

As an example, take the sentence

They like(d) it a lot

This could be parsed as

(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))

But if you parse this with NLTK into a tree, and output it - it is clear that the (d) is parsed as a new node, which is no surprise.

from nltk import Tree

s = '(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))'

tree = Tree.fromstring(s)
print(tree)

The result is

(S
  (NP (PRP They))
  (VP like (d ) (NP (PRP it)) (NP (DT a) (NN lot)))
  (. .))

So (d ) is a node inside the VP rather than part of the token like. Is there a way in the tree parser to escape parentheses?



from Escape parentheses in NLTK parse tree

No comments:

Post a Comment