Monday, 15 October 2018

Python Regular Expressions to NFA

I am currently using python's re module to search and capture groups. I've list of regular expressions which I have to compile and match against a large dataset which causes performance issues.

Example:

REGEXES = [
    '^New York(?P<grp1>\d+/\d+): (?P<grp2>.+)$',
    '^Ohio (?P<grp1>\d+/\d+/\d+): (?P<grp2>.+)$',
    '(?P<year>\d{4}-\d{1,2}-\d{1,2})$',
    '^(?P<year>\d{1,2}/\d{1,2}/\d{2,4})$',
    '^(?P<title>.+?)[- ]+E(?P<epi>\d+)$'
    .
    .
    .
    .
]

Note: Regexes won't be similar

COMPILED_REGEXES = [re.compile(r, flags=re.I) for r in REGEXES]

def find_match(string):
    for regex in COMPILED_REGEXES:
        match = regex.search(string)
        if not match:
            continue
        return match

Is there a way around this? The idea is to avoid iteration through the compiled regexes to get a match.



from Python Regular Expressions to NFA

No comments:

Post a Comment