Sunday, 22 January 2023

Determine the shortest possible match of a regular expression

I have an randomly ordered array of regular expressions like this:

let patterns = [
    /foo+ba+r/,
    /foo/,
    /foo+bar/,
    /foobar/,
    /m[eo]{4,}w/,
    /boo/,
    /fooo*/,
    /meow/
]

I'm not sure if this is possible but I would like to write an algorithm which sorts the regular expressions from least greedy to most greedy, like this:

[
    /foo/,
    /boo/,
    /fooo*/,
    /meow/,
    /foobar/,
    /foo+bar/,
    /m[eo]{4,}w/,
    /foo+ba+r/
]

I would imagine such sorting could be achieved like so:

patterns.sort((p1, p2) { return p1.greediness() - p2.greediness() });

But there exists no method called greediness in the the RegExpr class.

Ideally, the greediness method would return the number of characters which could be possibly matched at minimum. i.e:

/foo/.greediness() == 3
/boo/.greediness() == 3
/fooo*/.greediness() == 3
/meow/.greediness() == 4
/foobar/.greediness() == 6
/foo+bar/.greediness() == 6
/m[eo]{4,}w/.greediness() == 6
/foo+ba+r/.greediness() == 6

What would your solution be to this problem?



from Determine the shortest possible match of a regular expression

No comments:

Post a Comment