Monday, 13 February 2023

How to use exact words in NLTK RegexpParser

I want to extract specific phrases from text with a help of NLTK RegexpParser. Is there a way to combine exact word in pos_tags?

For example, this is my text:

import nltk

text = "Samle Text and sample Text and text With University of California and Institute for Technology with SAPLE TEXT"

tokens = nltk.word_tokenize(text)
tagged_text = nltk.pos_tag(tokens)

regex = "ENTITY:{<University|Institute><for|of><NNP|NN>}"

# searching by regex that is defined
entity_search = nltk.RegexpParser(regex)
entity_result = entity_search.parse(tagged_text)
entity_result = list(entity_result)
print(entity_result)

Ofc, I have a lot of different combinations of words that I want to use in my "ENTITY" regex, and I have much longer text. Is there a way to make it work? FYI, I want to make it work with RegexpParser, I do not want to use regular regexes.



from How to use exact words in NLTK RegexpParser

No comments:

Post a Comment