In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.
As an example, take the sentence
They like(d) it a lot
This could be parsed as
(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))
But if you parse this with NLTK into a tree, and output it - it is clear that the (d) is parsed as a new node, which is no surprise.
from nltk import Tree
s = '(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))'
tree = Tree.fromstring(s)
print(tree)
The result is
(S
(NP (PRP They))
(VP like (d ) (NP (PRP it)) (NP (DT a) (NN lot)))
(. .))
So (d ) is a node inside the VP rather than part of the token like. Is there a way in the tree parser to escape parentheses?
from Escape parentheses in NLTK parse tree
No comments:
Post a Comment