Find All Anagrams in a String问题
Contact me
- Blog -> https://cugtyt.github.io/blog/index
- Email -> cugtyt@qq.com
- GitHub -> Cugtyt@GitHub
题目来源:
Given a string s and a non-empty string p, find all the start indices of p’s anagrams in s.
Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.
The order of output does not matter.
Example 1:
Input:
s: "cbaebabacd" p: "abc"
Output:
[0, 6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:
Input:
s: "abab" p: "ab"
Output:
[0, 1, 2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".
由于子串的各种排列也需要计算在内,因此为了更简便的比较子串和目标串,一个直接的想法是把要比较的串排好序,这样的话就可以直接顺序比较了,代码如下:
def findAnagrams(self, s, p):
if len(p) > len(s):
return []
result = []
p_sort = sorted(p)
for i in range(0, len(s) - length + 1):
if p_sort == sorted(s[i:i+length]):
result.append(i)
return result
但是有一个麻烦的问题是这个方法在字符串很长的情况下会超时,因为每次排序是个耗时的操作,但是我们要在无序的情况下比较自然也不能穷举,考虑到每个字符的都是小写字母,因此我们可以记录子串的个数,对26个字母进行统计,这样我们就能够在无序的情况下做到子串的比较了。这里还有一个点是我们不能每次对一个子串都重新计数,也会遇到时间很长的问题,考虑到每次移动一位的话,中间部分的信息是可以重复利用的,这是一个滑动窗,所以我们可以每次移动的时候,对滑动出去的计数减1,新进来加1,这样每次技术的计算量小了很多。这里是网上的一个代码(来源):
from collections import Counter
def findAnagrams(self, s, p):
res = []
pCounter = Counter(p)
sCounter = Counter(s[:len(p)-1])
for i in range(len(p)-1,len(s)):
sCounter[s[i]] += 1 # include a new char in the window
if sCounter == pCounter: # This step is O(1), since there are at most 26 English letters
res.append(i-len(p)+1) # append the starting index
sCounter[s[i-len(p)+1]] -= 1 # decrease the count of oldest char in the window
if sCounter[s[i-len(p)+1]] == 0:
del sCounter[s[i-len(p)+1]] # remove the count if it is 0
return res
代码的结构和上面的差不多,不过就是循环里面的处理不同,每次对计数值做修改,看是不是匹配。这里的Counter是python提供的用于计数的包,可以查阅官方文档。