Z Algorithm: Efficient Substring Search
In string matching, sometimes we need an efficient algorithm that can identify all the instances of a substring within a longer string. While there are numerous algorithms available, the Z algorithm stands out for its efficiency in such searches. This article will explore the inner workings of the Z algorithm and how it can be used for substring search in JavaScript.
Understanding the Z Algorithm
The Z algorithm is primarily used for pattern searching in strings. It constructs a Z-array from the input string, which helps in optimizing the search for the pattern within the string.
For a given string str
, the idea is to concatenate the pattern and the text, using a special character that doesn’t appear in either (typically $
). For instance, if we have a pattern P
and a text T
, our combined string becomes P$T
.
The Z-array, often termed the Z-boxes, represents the lengths of the substrings that match the prefix of str
. Specifically, z[i]
denotes the longest substring starting from i
that matches the prefix of str
.
Implementing the Z Algorithm in JavaScript
Let’s begin by constructing our Z-array:
function constructZArray(str) {
let n = str.length;
let Z = Array(n).fill(0);
let left = 0, right = 0;
for (let i = 1; i < n; i++) {
if (i <= right) {
Z[i] = Math.min(right - i + 1, Z[i - left]);
}
while (i + Z[i] < n && str[Z[i]] === str[i + Z[i]]) {
Z[i]++;
}
if (i + Z[i] - 1 > right) {
left = i;
right = i + Z[i] - 1;
}
}
return Z;
}
Now, using the Z-array, we can efficiently search for our pattern:
function ZSearch(text, pattern) {
let combined = pattern + "$" + text;
let Z = constructZArray(combined);
let result = [];
for (let i = 0; i < Z.length; i++) {
if (Z[i] === pattern.length) {
result.push(i - pattern.length - 1);
}
}
return result; // Indices in text where pattern starts.
}
This function will return all the indices in the text where the pattern starts.
Conclusion
The Z algorithm is a powerful technique for pattern searching. It optimizes the search by preprocessing the pattern and the text to avoid redundant comparisons. With its linear time complexity, it serves as an efficient alternative to many other substring search methods, making it particularly useful in applications where performance is crucial.