Suppose we have a n letter string and we are searching for most repeated m letter substring (1=<m =< n). I am just searching for an algorithm which solves this problem in linear time. And I have reached to suffix tree. But how can I solve it by suffix tree?
Thanks a lot.
Idea
You can also solve it with hash function.
We can convert strings to base
bnumbers wherebis a prime number.For example, if the strings only consist of lowercase alphabet (26 characters
a-z) then we can choosebequals29.Then we map string characters to corresponding numbers. For example:
String
abcwill equals29^2*1 + 29^1*2 + 29^0*3 = 899in base29.We should map
a -> 1but nota -> 0since hash value ofaaaandaawill be equal in baseb, which shouldn't be.Now instead of compare two strings, we can compares their hash value in base
b. If their hash value are equal then we can say they are equal.Since hash value can be very large, you can use it's module a large prime number, for example
mod 1e9+7. The posibility of two different strings have same hash value is very low in this case.Algorithm
The algorithm can be described as bellow:
To calculate
hash(s), first we build arrayHwhere:Here
S[i]is the mapped number of character i-th of string S.To calculate
b^x, we can calculate arraypowbwhere:Then for a substring
s[l..r]of string S,As we can see,
hash(s)can be negative, in this case we should addmodtohash(s)(hash(s) += mod).Complexity