Tuesday, 12 November 2019

Max Number of unique substrings from a partition

Modified the title a little bit to make it more understandable.

Here is the detailed version. Supposed we have a string s, we want to split it into some substrings. Each substring is different from each other. What is the maximum number of the unique substrings that we can have from one cut. In other word, you can recover s by concatenating those substrings.

Here are some examples:

Example 1
s = 'aababaa'
output = 4
Explain: we can split it into aa|b|aba|a or aab|a|b|aa, 
         and 4 is the max number of substrings we can get from one split.

Example 2
s = 'aba'
output = 2
Explain: a|ba

Example 3
s = 'aaaaaaa'
output = 3
Explain: a|aa|aaaa

Note: s only contains lowercase characters. Not sure how long s is which makes me could not guess the optimal time complexity. :(

Is it a NP-hard problem? If not, how to solve it efficiently?

I heard this problem from one of my friend. And I could not get it out. I am trying to use a Trie + greedy to solve this problem. But it fails on the first example.

Here is the trie solution that I came up with:

def triesolution(s):
    trie = {}
    p = trie
    output = 0
    for char in s:
        if char not in p:
            output += 1
            p[char] = {}
            p = trie
        else:
            p = p[char]
    return output

For example 1, the above code will return 3, since it is trying to split it into a|ab|abaa.

Add: Thanks everyone's idea, it looks like this problem is very close to an NP problem. Right now, I am trying to think it from this direction. Suppose we have a function Guess(n). This function will return True if we could find n unique substrings from one split or False otherwise. One observation here is that if Guess(n) == True, then Guess(i) == True for all i <= n. Since we can merge two adjacent substrings together. This observation can lead to a binary solution. However, it still requires we can compute the Guess function very efficiently. Sadly, I still could not find out a polynomial way to compute Guess(n).



from Max Number of unique substrings from a partition

No comments:

Post a Comment