Friday, July 29, 2016

[Algorithm] Self Crossing


Self Crossing


Problem Statement:

You are given an array x of n positive numbers. You start at point (0,0) and moves x[0] metres to the north, then x[1] metres to the west, x[2] metres to the south, x[3] metres to the east and so on. In other words, after each move your direction changes counter-clockwise.

Write a one-pass algorithm with O(1) extra space to determine, if your path crosses itself, or not.




Solution: Polygon pattern, O(n) solution


At first glance it may seem to be very difficult to solve. I initially wanted to keep track of the points and use bounding box to determine the crossing but still cannot get it right.

Now the solution for this is not very easy to come up with, I read someone's post on how to solve this and I'm just trying to explain that solution here.

Basically since we are only allowed to go straight left, right, up and down and in a counter-clockwise order, the shape of the points movement forms polygons, muck like tetrix blocks really.


These blocks indeed are just two kinds, one is rectangle the other is L-shape pentagon. The rectangle shape would have two different sub types, one is made up by 4 lines, the other is made up by 5 lines.


For three types there are constraints:

Type 1: x[n-2] <= x[n] && x[n-1] <= x[n-3]

Type 2: x[n] + x[n-4] >= x[n-2] && x[n-1] == x[n-3]

Type 3: x[n] + x[n-4] >= x[n-2] && x[n-1] + x[n-5] >= x[n-3] && x[n-1] <= x[n-3] && x[n-2] >= x[n-4]

Those shapes are orientation-less, meaning orientation doesn't matter for these three patterns, one can rotate them and those conditions still hold. This means that for any point in the given sequence we just need to check those 3 different constraints, if any one of them breaks there must be a crossing. No previous states need to be memorized.


Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Solution {
public:
    bool isSelfCrossing(vector<int>& x) {
        int ret = check(x);
        return ret != 0;
    }
    
    inline int check(const vector<int>& x) {
        int n = x.size();
        for (int i=0; i<n; i++) {
            if (i >= 3 && x[i-2] <= x[i] && x[i-1] <= x[i-3]) return 1;
            if (i >= 4 && x[i] + x[i-4] >= x[i-2] && x[i-1] == x[i-3]) return 2;
            if (i >= 5 && x[i] + x[i-4] >= x[i-2] && x[i-1] + x[i-5] >= x[i-3] && x[i-1] <= x[i-3] && x[i-2] >= x[i-4]) return 3;
        } 
        return 0;
    }
};




[Algorithm] House Robber III


House Robber III


Problem statement


The thief has found himself a new place for his thievery again. There is only one entrance to this area, called the "root." Besides the root, each house has one and only one parent house. After a tour, the smart thief realized that "all houses in this place forms a binary tree". It will automatically contact the police if two directly-linked houses were broken into on the same night.


Example 1:
     3
    / \
   2   3
    \   \ 
     3   1
Maximum amount of money the thief can rob = 3 + 3 + 1 = 7.
Example 2:
     3
    / \
   4   5
  / \   \ 
 1   3   1
Maximum amount of money the thief can rob = 4 + 5 = 9.

Determine the maximum amount of money the thief can rob tonight without alerting the police.



Solution: DFS + DP


At each node the maximum amount we can rob is determined by whether we want to rob the current house or skip this house. To keep the constraint that two connected house cannot be robbed, we can utilize post order DFS. In post order DFS we just need to check one side of the constraint which is current house and its immediate children. The other side which is between immediate parent and this house will be checked when the parent node is visited.

Like other house robbery problem, we keep the maximum loot we can have at each node and we also need to know how much we can get if we skip current house, in other words the dp[i] = max((dp[i] + dp[i-2]), dp[i-1]) state transfer still holds here. Thus we just need to add two return values (as reference), one to represent dp[i-1] and one to represent dp[i-2].

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class Solution {
public:
    int rob(TreeNode* root) {
        int child = 0, grandchild = 0;
        postDFS(root, child, grandchild);
        return max(child, grandchild);
    }
    
    /* 
    * current loot means maximum loot starting at this node, 
    * it can come from either taking this node or skiping this node whichever is larger
    */
    void postDFS(TreeNode* root, int& currLoot, int& childLoot) {
        if (root == NULL) {
            currLoot = 0;
            childLoot = 0;
            return;
        }
        
        int left = 0, leftgrand = 0;
        postDFS(root->left, left, leftgrand);
        
        int right = 0, rightgrand = 0;
        postDFS(root->right, right, rightgrand);
        
        int take = root->val + leftgrand + rightgrand;
        int skip = left + right;
        
        currLoot = max(take, skip);
        childLoot = skip;
    }
};






Thursday, July 28, 2016

[Algorithm] Longest Increasing Path in a Matrix

Longest Increasing Path in a Matrix


Problem Statement:

Given an integer matrix, find the length of the longest increasing path.

From each cell, you can either move to four directions: left, right, up or down. You may NOT move diagonally or move outside of the boundary (i.e. wrap-around is not allowed).


Example 1:
nums = [
  [9,9,4],
  [6,6,8],
  [2,1,1]
]
Return 4
The longest increasing path is [1, 2, 6, 9].
Example 2:
nums = [
  [3,4,5],
  [3,2,6],
  [2,2,1]
]
Return 4
The longest increasing path is [3, 4, 5, 6]. Moving diagonally is not allowed.

Solution: Topological Sort + DP


The key to solving this problem is that we can view the matrix as a directed graph, each point in the matrix is a vertex and edge only exists among neighboring points when two point has increasing order the direction is from the smaller to the larger number (eg. matrix[i][j] -> matrix[i][j-1], if matrix[i][j] < matrix[i][j-1]). Topological sort can help us group the paths in this graph. And we can compare the length of different paths originated from each individual points in the matrix.

For example, in the following matrix, we could have multiple paths originating from "4" and we can find them by performing topological sort.


Another thing to notice is that we can update the start of previous found sequence as we pick another unvisited point as the starting point for another topological sort. Thus concatenating the sequence. For example, when we start with "4" we can find a sequence of "4->5->6->7->8->9->10" and when we start from "3" which is not visited the sequence "3" can be concatenated to "3->4->5->6->7->8->9->10" because there exists an edge "3->4".


Since our target is to find out the longest sequence, the DP concept comes into play. We know that we can concatenate topological order to form longer sequence, we just need to keep track of the longest sequence starting from each point in the matrix and we can also use this DP matrix as "visited" flag for all the points.

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class Solution {
public:
    int longestIncreasingPath(vector<vector<int>>& matrix) {
        int n = matrix.size();
        if (n == 0) return 0;
        int m = matrix[0].size();
        if (m == 0) return 0;
        
        vector<vector<int> > topo(n, vector<int>(m, -1));
        int maxLen = INT_MIN;
        for (int i=0; i<n; i++) {
            for (int j=0; j<m; j++) {
                if (topo[i][j] == -1) {
                    dfs(matrix, i, j, topo);
                }
                maxLen = max(maxLen, topo[i][j]);
            }
        }
        
        return maxLen;
    }
    
    void dfs(const vector<vector<int>>& matrix, int i, int j, vector<vector<int>>& topo) {
        topo[i][j] = 1;
        
        /* visting neighbor */
        if (i-1>=0 && matrix[i][j] < matrix[i-1][j]) {
            if (topo[i-1][j] == -1) {
                dfs(matrix, i-1, j, topo);
            }
            topo[i][j] = max(topo[i][j], topo[i-1][j]+1);
        }
        
        if (i+1<matrix.size() && matrix[i][j] < matrix[i+1][j]) {
            if (topo[i+1][j] == -1) {
                dfs(matrix, i+1, j, topo);
            }
            topo[i][j] = max(topo[i][j], topo[i+1][j]+1);
        }
        
        if (j-1>=0 && matrix[i][j] < matrix[i][j-1]) {
            if (topo[i][j-1] == -1) {
                dfs(matrix, i, j-1, topo);
            }
            topo[i][j] = max(topo[i][j], topo[i][j-1]+1);
        }
        
        if (j+1<matrix[0].size() && matrix[i][j] < matrix[i][j+1]) {
            if (topo[i][j+1] == -1) {
                dfs(matrix, i, j+1, topo);
            }
            topo[i][j] = max(topo[i][j], topo[i][j+1]+1);
        }
    }
};



Wednesday, July 20, 2016

[Algorithm] Remove Invalid Parentheses

Remove Invalid Parentheses


Problem Statement:

Remove the minimum number of invalid parentheses in order to make the input string valid. Return all possible results.

Note: The input string may contain letters other than the parentheses ( and ).


Examples:
"()())()" -> ["()()()", "(())()"]
"(a)())()" -> ["(a)()()", "(a())()"]
")(" -> [""]

Solution: DFS based reconstruct with pruning.


This type of problem is bound to have us visit all possible removal options. The brutal force way will be reconstruct every possible results and pick both 1) parenthesis is valid and 2) removal is minimum.

So now based on this we can further improve the solution. 

A. Pruning:

First off checking whether the resulting parenthesis is valid after we reconstruct the string is not a very efficient thing to do, it takes O(n). But while we are building up the resulting string, we can prune out those invalid ones very quickly. Let's look at some possible invalid parentheses patterns:

1. )))(((
2. ()))
3. ((())

The pattern 1 and 2 all have a position where the number of opening brackets is less than closing. Pattern 3 at the very last position have "(" not equal to ")". So as we reconstruct our resulting string, we keep track of opening brackets count and closing brackets count, and once we have (lc < rc), we can prune this sequence.

For the pattern 3, when we are at the ending of the sequence, we need to check if (lc == rc) and if not we don't keep it as valid result.

B. Minimum removed:

Now let's think about how to remove minimum brackets. Still let's look at the previously mentioned 3 patterns. For each of those pattern, we need to keep a promise that a left bracket will be closed with a right bracket.

For pattern 1, we need to remove 3 ")" and 4 "(". Because for the first 3 ")" there's 0 left brackets to be closed with. For the 4 left brackets there's 0 right brackets after them to close them.

For pattern 2, we need to remove 2 ")", because the one "(" are already closed by the first ")".

For pattern 3, we need to remove 1 "(", because the 2 ")" can close 2 "(" and leaving the last "(" to be removed.

So if we do a pre-process of the input string, we can find out the maximum ")" and "(" we need to remove in O(n) time and we can check these quantities while we are reconstructing the results.

The reconstruction of valid sequence is a DFS based and a each position we can choose to include the parenthesis or not, by opting not to include it, it's equivalent to removing it.

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class Solution {
public:
    vector<string> removeInvalidParentheses(string s) {
        int ldmax = 0;
        int rdmax = 0;
        
        for (auto c : s) {
            if (c != ')' && c != '(') continue;
            if (c == '(') {
                ldmax++;
            } else {
                if (ldmax == 0) {
                    rdmax++;
                } else {
                    ldmax--;
                }
            }
        }
        
        string path;
        unordered_set<string> uniqueResults;
        dfs(s, uniqueResults, path, 0, 0, ldmax, rdmax, 0);
        return std::vector<string>(uniqueResults.begin(), uniqueResults.end());
    }
    
    void dfs(const string& s, unordered_set<string>& result, string path, int lc, int rc, int ldmax, int rdmax, int pos) {
        if (lc < rc) return;
        if (pos >= s.size()) {
            if (lc == rc) result.insert(path);
            return;
        }
        
        char c = s[pos];
        if (c == '(') {
            dfs(s, result, path+c, lc+1, rc, ldmax, rdmax, pos+1);
            if (ldmax > 0) dfs(s, result, path, lc, rc, ldmax-1, rdmax, pos+1);
        } else if (c == ')') {
            dfs(s, result, path+c, lc, rc+1, ldmax, rdmax, pos+1);
            if (rdmax > 0) dfs(s, result, path, lc, rc, ldmax, rdmax-1, pos+1);
        } else {
            dfs(s, result, path+c, lc, rc, ldmax, rdmax, pos+1);
        }
    }
};


Tuesday, July 19, 2016

[Algorithm] Distinct Subsequences


Distinct Subsequences


Problem Statement:

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"
Return 3.



Solution: Dynamic Programming


We can try to solve this problem on paper, and usually when I'm approaching this kind of problem, I'd like to draw a m * n board. So we have the following board with the example given in problem statement.


The green shaded boxes are potential matches and we can view each row as given the string T ended at the character of that row, what's the maximum matches possible. Here's the partially filled board:



The "bbb" area is colored differently so we can explain it more clearly. For string sequence ending with "a", it's clear that we only have 1 possible match. When we are processing string ending with "b" (row index 3), we can have:

1 possible match at table[3][3], shaded red.
2 possible matches at table[3][4], shaded yellow, which is 1) match table[3][3] ignore table[3][4] and 2) match table[3][4] ignore table[3][3].

And at row index 4, the second "b" in the string T, we have:

1 possible match at table[4][4], which is matching table[3][3].
3 possible matches at table[4][5], which are:
   1) match table[3][3] and table[4][4], ignore table[4][5]
   2) match table[4][5] and inherit 2 possible matches at table[3][4]

Also please note that if s[i] != s[j], the only possible match we can have is skip table[i][j] and copy over match at table[i][j-1]

So the pattern of calculation is becoming clear:

1) If s[i] == s[j], we can skip self or choose to match and inherit from table[i-1][j-1], which gives us: table[i][j] = table[i-1][j-1] + table[i][j-1]
2) If s[i] != s[j], only option for us is to skip match at table[i][j] which means table[i][j] = table[i][j-1]

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class Solution {
public:
    int numDistinct(string s, string t) {
        int n = s.size();
        int m = t.size();
        
        vector<vector<int> > table(m+1, vector<int>(n+1));
        
        for (int i=0; i<=m; i++) {
            for (int j=0; j<=n; j++) {
                if (i == 0) table[i][j] = 1;
                else if (j == 0) table[i][j] = 0;
                else {
                    table[i][j] = (s[j-1] == t[i-1]) * table[i-1][j-1] + table[i][j-1];
                }
            }
        }
        
        return table[m][n];
    }
};