Tuesday, July 19, 2016

[Algorithm] Distinct Subsequences


Distinct Subsequences


Problem Statement:

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"
Return 3.



Solution: Dynamic Programming


We can try to solve this problem on paper, and usually when I'm approaching this kind of problem, I'd like to draw a m * n board. So we have the following board with the example given in problem statement.


The green shaded boxes are potential matches and we can view each row as given the string T ended at the character of that row, what's the maximum matches possible. Here's the partially filled board:



The "bbb" area is colored differently so we can explain it more clearly. For string sequence ending with "a", it's clear that we only have 1 possible match. When we are processing string ending with "b" (row index 3), we can have:

1 possible match at table[3][3], shaded red.
2 possible matches at table[3][4], shaded yellow, which is 1) match table[3][3] ignore table[3][4] and 2) match table[3][4] ignore table[3][3].

And at row index 4, the second "b" in the string T, we have:

1 possible match at table[4][4], which is matching table[3][3].
3 possible matches at table[4][5], which are:
   1) match table[3][3] and table[4][4], ignore table[4][5]
   2) match table[4][5] and inherit 2 possible matches at table[3][4]

Also please note that if s[i] != s[j], the only possible match we can have is skip table[i][j] and copy over match at table[i][j-1]

So the pattern of calculation is becoming clear:

1) If s[i] == s[j], we can skip self or choose to match and inherit from table[i-1][j-1], which gives us: table[i][j] = table[i-1][j-1] + table[i][j-1]
2) If s[i] != s[j], only option for us is to skip match at table[i][j] which means table[i][j] = table[i][j-1]

Code implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class Solution {
public:
    int numDistinct(string s, string t) {
        int n = s.size();
        int m = t.size();
        
        vector<vector<int> > table(m+1, vector<int>(n+1));
        
        for (int i=0; i<=m; i++) {
            for (int j=0; j<=n; j++) {
                if (i == 0) table[i][j] = 1;
                else if (j == 0) table[i][j] = 0;
                else {
                    table[i][j] = (s[j-1] == t[i-1]) * table[i-1][j-1] + table[i][j-1];
                }
            }
        }
        
        return table[m][n];
    }
};



No comments:

Post a Comment