Find All Anagrams in a String问题

Contact me

Blog -> https://cugtyt.github.io/blog/index
Email -> cugtyt@qq.com
GitHub -> Cugtyt@GitHub

Given a string s and a non-empty string p, find all the start indices of p’s anagrams in s.

Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.

The order of output does not matter.

Example 1:

Input:
s: "cbaebabacd" p: "abc"

Output:
[0, 6]

Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input:
s: "abab" p: "ab"

Output:
[0, 1, 2]

Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

由于子串的各种排列也需要计算在内，因此为了更简便的比较子串和目标串，一个直接的想法是把要比较的串排好序，这样的话就可以直接顺序比较了，代码如下：

def findAnagrams(self, s, p):
    if len(p) > len(s):
        return []

    result = []
    p_sort = sorted(p)
    
    for i in range(0, len(s) - length + 1):
        if p_sort == sorted(s[i:i+length]):
            result.append(i)
            
    return result

但是有一个麻烦的问题是这个方法在字符串很长的情况下会超时，因为每次排序是个耗时的操作，但是我们要在无序的情况下比较自然也不能穷举，考虑到每个字符的都是小写字母，因此我们可以记录子串的个数，对26个字母进行统计，这样我们就能够在无序的情况下做到子串的比较了。这里还有一个点是我们不能每次对一个子串都重新计数，也会遇到时间很长的问题，考虑到每次移动一位的话，中间部分的信息是可以重复利用的，这是一个滑动窗，所以我们可以每次移动的时候，对滑动出去的计数减1，新进来加1，这样每次技术的计算量小了很多。这里是网上的一个代码（来源）：

from collections import Counter

def findAnagrams(self, s, p):
    res = []
    pCounter = Counter(p)
    sCounter = Counter(s[:len(p)-1])
    for i in range(len(p)-1,len(s)):
        sCounter[s[i]] += 1   # include a new char in the window
        if sCounter == pCounter:    # This step is O(1), since there are at most 26 English letters 
            res.append(i-len(p)+1)   # append the starting index
        sCounter[s[i-len(p)+1]] -= 1   # decrease the count of oldest char in the window
        if sCounter[s[i-len(p)+1]] == 0:
            del sCounter[s[i-len(p)+1]]   # remove the count if it is 0
    return res

代码的结构和上面的差不多，不过就是循环里面的处理不同，每次对计数值做修改，看是不是匹配。这里的Counter是python提供的用于计数的包，可以查阅官方文档。