I want to extract specific phrases from text with a help of NLTK RegexpParser. Is there a way to combine exact word in pos_tags?
For example, this is my text:
import nltk
text = "Samle Text and sample Text and text With University of California and Institute for Technology with SAPLE TEXT"
tokens = nltk.word_tokenize(text)
tagged_text = nltk.pos_tag(tokens)
regex = "ENTITY:{<University|Institute><for|of><NNP|NN>}"
# searching by regex that is defined
entity_search = nltk.RegexpParser(regex)
entity_result = entity_search.parse(tagged_text)
entity_result = list(entity_result)
print(entity_result)
Ofc, I have a lot of different combinations of words that I want to use in my "ENTITY" regex, and I have much longer text. Is there a way to make it work? FYI, I want to make it work with RegexpParser, I do not want to use regular regexes.
from How to use exact words in NLTK RegexpParser
No comments:
Post a Comment