I am currently using python's re module to search and capture groups. I've list of regular expressions which I have to compile and match against a large dataset which causes performance issues.
Example:
REGEXES = [
'^New York(?P<grp1>\d+/\d+): (?P<grp2>.+)$',
'^Ohio (?P<grp1>\d+/\d+/\d+): (?P<grp2>.+)$',
'(?P<year>\d{4}-\d{1,2}-\d{1,2})$',
'^(?P<year>\d{1,2}/\d{1,2}/\d{2,4})$',
'^(?P<title>.+?)[- ]+E(?P<epi>\d+)$'
.
.
.
.
]
Note: Regexes won't be similar
COMPILED_REGEXES = [re.compile(r, flags=re.I) for r in REGEXES]
def find_match(string):
for regex in COMPILED_REGEXES:
match = regex.search(string)
if not match:
continue
return match
Is there a way around this? The idea is to avoid iteration through the compiled regexes to get a match.
from Python Regular Expressions to NFA
No comments:
Post a Comment