Hemant Vishwakarma: How to use exact words in NLTK RegexpParser

Monday, 13 February 2023

How to use exact words in NLTK RegexpParser

I want to extract specific phrases from text with a help of NLTK RegexpParser. Is there a way to combine exact word in pos_tags?

For example, this is my text:

import nltk

text = "Samle Text and sample Text and text With University of California and Institute for Technology with SAPLE TEXT"

tokens = nltk.word_tokenize(text)
tagged_text = nltk.pos_tag(tokens)

regex = "ENTITY:{<University|Institute><for|of><NNP|NN>}"

# searching by regex that is defined
entity_search = nltk.RegexpParser(regex)
entity_result = entity_search.parse(tagged_text)
entity_result = list(entity_result)
print(entity_result)

Ofc, I have a lot of different combinations of words that I want to use in my "ENTITY" regex, and I have much longer text. Is there a way to make it work? FYI, I want to make it work with RegexpParser, I do not want to use regular regexes.

from How to use exact words in NLTK RegexpParser

Hemant Vishwakarma

Monday, 13 February 2023

How to use exact words in NLTK RegexpParser

No comments:

Post a Comment