Wednesday, July 20, 2016

[Algorithm] Remove Invalid Parentheses

Remove Invalid Parentheses


Problem Statement:

Remove the minimum number of invalid parentheses in order to make the input string valid. Return all possible results.

Note: The input string may contain letters other than the parentheses ( and ).


Examples:
"()())()" -> ["()()()", "(())()"]
"(a)())()" -> ["(a)()()", "(a())()"]
")(" -> [""]

Solution: DFS based reconstruct with pruning.


This type of problem is bound to have us visit all possible removal options. The brutal force way will be reconstruct every possible results and pick both 1) parenthesis is valid and 2) removal is minimum.

So now based on this we can further improve the solution. 

A. Pruning:

First off checking whether the resulting parenthesis is valid after we reconstruct the string is not a very efficient thing to do, it takes O(n). But while we are building up the resulting string, we can prune out those invalid ones very quickly. Let's look at some possible invalid parentheses patterns:

1. )))(((
2. ()))
3. ((())

The pattern 1 and 2 all have a position where the number of opening brackets is less than closing. Pattern 3 at the very last position have "(" not equal to ")". So as we reconstruct our resulting string, we keep track of opening brackets count and closing brackets count, and once we have (lc < rc), we can prune this sequence.

For the pattern 3, when we are at the ending of the sequence, we need to check if (lc == rc) and if not we don't keep it as valid result.

B. Minimum removed:

Now let's think about how to remove minimum brackets. Still let's look at the previously mentioned 3 patterns. For each of those pattern, we need to keep a promise that a left bracket will be closed with a right bracket.

For pattern 1, we need to remove 3 ")" and 4 "(". Because for the first 3 ")" there's 0 left brackets to be closed with. For the 4 left brackets there's 0 right brackets after them to close them.

For pattern 2, we need to remove 2 ")", because the one "(" are already closed by the first ")".

For pattern 3, we need to remove 1 "(", because the 2 ")" can close 2 "(" and leaving the last "(" to be removed.

So if we do a pre-process of the input string, we can find out the maximum ")" and "(" we need to remove in O(n) time and we can check these quantities while we are reconstructing the results.

The reconstruction of valid sequence is a DFS based and a each position we can choose to include the parenthesis or not, by opting not to include it, it's equivalent to removing it.

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class Solution {
public:
    vector<string> removeInvalidParentheses(string s) {
        int ldmax = 0;
        int rdmax = 0;
        
        for (auto c : s) {
            if (c != ')' && c != '(') continue;
            if (c == '(') {
                ldmax++;
            } else {
                if (ldmax == 0) {
                    rdmax++;
                } else {
                    ldmax--;
                }
            }
        }
        
        string path;
        unordered_set<string> uniqueResults;
        dfs(s, uniqueResults, path, 0, 0, ldmax, rdmax, 0);
        return std::vector<string>(uniqueResults.begin(), uniqueResults.end());
    }
    
    void dfs(const string& s, unordered_set<string>& result, string path, int lc, int rc, int ldmax, int rdmax, int pos) {
        if (lc < rc) return;
        if (pos >= s.size()) {
            if (lc == rc) result.insert(path);
            return;
        }
        
        char c = s[pos];
        if (c == '(') {
            dfs(s, result, path+c, lc+1, rc, ldmax, rdmax, pos+1);
            if (ldmax > 0) dfs(s, result, path, lc, rc, ldmax-1, rdmax, pos+1);
        } else if (c == ')') {
            dfs(s, result, path+c, lc, rc+1, ldmax, rdmax, pos+1);
            if (rdmax > 0) dfs(s, result, path, lc, rc, ldmax, rdmax-1, pos+1);
        } else {
            dfs(s, result, path+c, lc, rc, ldmax, rdmax, pos+1);
        }
    }
};


No comments:

Post a Comment