动态规划求解不同子串数目

Contact me:

Blog -> https://cugtyt.github.io/blog/index
Email -> cugtyt@qq.com
GitHub -> Cugtyt@GitHub

Given a string S and a string T, count the number of distinct subsequences of S which equals T.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters.
(ie, “ACE” is a subsequence of “ABCDE” while “AEC” is not).

Here is an example:
S = “rabbbit”, T = “rabbit” Return 3.**

from discuss in leetcode，大神回答

用到了动态规划，也就是划分子问题。这里以字符为单位，本问题可以分为当前字符匹配+当前字符前面的子串自问题，具体如下。

The idea is the following:

we will build an array mem where mem[i+1][j+1] means that S[0..j] contains T[0..i] that many times as distinct subsequences. Therefor the result will be mem[T.length()][S.length()]. we can build this array rows-by-rows:

the first row must be filled with 1. That’s because the empty string is a subsequence of any string but only 1 time. So mem[0][j] = 1 for every j. So with this we not only make our lives easier, but we also return correct value if T is an empty string.

the first column of every rows except the first must be 0. This is because an empty string cannot contain a non-empty string as a substring – the very first item of the array: mem[0][0] = 1, because an empty string contains the empty string 1 time.

So the matrix looks like this:

  S 0123....j  
T +----------+  
  |1111111111|  
0 |0         |  
1 |0         |  
2 |0         |  
. |0         |  
. |0         |  
i |0         |  

From here we can easily fill the whole grid: for each (x, y), we check if S[x] == T[y] we add the previous item and the previous item in the previous row, otherwise we copy the previous item in the same row. The reason is simple:

if the current character in S doesn’t equal to current character T, then we have the same number of distinct subsequences as we had without the new character.

if the current character in S equal to the current character T, then the distinct number of subsequences: the number we had before plus the distinct number of subsequences we had with less longer T and less longer S.

An example:

S: [acdabefbc] 
T: [ab]

first we check with a:

           *  *
      S = [acdabefbc]
mem[1] = [0111222222]
then we check with ab:
               *  * ]
      S = [acdabefbc]
mem[1] = [0111222222]
mem[2] = [0000022244]

And the result is 4, as the distinct subsequences are:

S = [a   b    ]
S = [a      b ]
S = [   ab    ]
S = [   a   b ]

class Solution {
    public int numDistinct(String S, String T) {
        int[][] mem = new int[T.length()+1][S.length()+1];

        // filling the first row: with 1s
        for(int j=0; j<=S.length(); j++) {
            mem[0][j] = 1;
        }
    
        // the first column is 0 by default in every other rows but the first, which we need.
    
        for(int i=0; i<T.length(); i++) {
            for(int j=0; j<S.length(); j++) {
                if(T.charAt(i) == S.charAt(j)) {
                    mem[i+1][j+1] = mem[i][j] + mem[i+1][j];
                } else {
                    mem[i+1][j+1] = mem[i+1][j];
                }
            }
        }
    
        return mem[T.length()][S.length()];

    }
}