The LPS (Longest Proper Prefix which is also a Suffix) algorithm goes as follows:
public static int[] constructLPSArray(String s) {
int n = s.length();
int[] arr = new int[n];
int j = 0;
for (int i = 1; i < n; ) {
if (s.charAt(i) == s.charAt(j)) {
arr[i] = j + 1;
i++;
j++;
} else {
if (j != 0) {
j = arr[j - 1];
} else {
i++;
}
}
}
return arr;
}
The if (s.charAt(i) == s.charAt(j)) part looks clear, but the else part is unclear.
Why do we do:
if (j != 0) {
j = arr[j - 1];
} else {
i++;
}
More specifically, why does j = arr[j - 1] work ? Or why do we even do it? How do we validate the correctness of this step?
Let's say we are parsing an array of characters with
iandjpositioned like this:with
arrholding:i. e., the length of the longest prefix/suffix of each substring of s of that length until
i. You can probably guess how that was generated from the rest of the algorithm. Now, if the next character afteridoes not match the next character afterj,we don't have to retry the matching, because we know the longest prefix/suffix of our previous prefix/suffix! Looking up
arr[j - 1]yields 2 – so we essentially cached the information that the parts highlighted hereare identical, and don't need to be compared again!