Based on the grammar in the chapter 7 of the NLTK Book:
grammar = r"""
NP: {<DT|JJ|NN.*>+} # ...
"""
I want to expand NP (noun phrase) to include multiple NP joined by CC (coordinating conjunctions: and) or , (commas) to capture noun phrases like:
- The house and tree
- The apple, orange and mango
- Car, house, and plane
I cannot get my modified grammar to capture those as a single NP:
import nltk
grammar = r"""
NP: {<DT|JJ|NN.*>+(<CC|,>+<NP>)?}
"""
sentence = 'The house and tree'
chunkParser = nltk.RegexpParser(grammar)
words = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(words)
print(chunkParser.parse(tagged))
Results in:
(S (NP The/DT house/NN) and/CC (NP tree/NN))
I've tried moving the NP to the beginning: NP: {(<NP><CC|,>+)?<DT|JJ|NN.*>+} but I get the same result
(S (NP The/DT house/NN) and/CC (NP tree/NN))
from Recursion in nltk's RegexpParser
No comments:
Post a Comment