Let us think about the problem of creating an optimal (i.e. longest) chunking in terms of an iterative process of chipping off equal-sized chunks from both sides of the input string
.
Exhaustive search. How large should these chipped-off prefixes and suffixes be at each step? Naively, we can try chipping off every possible length
, then recursively figure out the optimal chunking for the remainder:
![\displaystyle solution(s) = \max_{k \in 1 \dots |n|/2} \begin{cases}
2+solution(s[k:-k]) & \text{if }s[:k] = s[-k:] \\
1 & \text{otherwise}
\end{cases}](//static.dmoj.ca/mathoid/4f51b9f72ce32fde235fb4e6bc7e378aaa45f793/svg)
The expression above uses Python-like notation:
is the prefix of length
of
,
is the suffix, and
is everything but the prefix and suffix.
Dynamic programming. This solution takes time exponential in the length of the string, and will only solve the easiest set of inputs. However, note that the input to
is fully defined just by its length. It follows that there are only
possible different calls to
, and also that we can pass that length, rather than the actual substring of
, to
. If we memoize the function outputs, this results in an
runtime. This (or the equivalent dynamic programming solution without memoization) solves the first two sets of inputs.
Greedy. We do not have to consider all values of
, however. A greedy approach, always chipping off the smallest possible chunk, gives optimal results. Let us sketch the proof (in the interest of space, notation is not fully formal).
Without loss of generality, assume that the greedy algorithm and the optimal algorithm differ in the first chunk they chip off the input string
: the greedy algorithm selects a chunk
, and the optimal algorithm selects a strictly larger chunk
,
. Let us denote the prefix/suffix as
and
:
We consider two cases:
.
Equating
gives us
, where
is possibly an empty string. So chunking the prefix/suffix
as
yields a longer palindrome than the "optimal" chunking of
. Contradiction.
.
Equating
and taking into account
gives us
for some prefix
of
; note that
is both a prefix and a suffix of
. We can keep applying the same logic to get
, until
.
is both a prefix and a suffix of
(by the same logic that
was), and thus of
. This prefix and suffix do not overlap, because
. So
can be written as
for some
, and could be chunked more finely. Contradiction as in case 1.
Since both cases lead to a contradiction, we conclude that the greedy solution is optimal.
A straightforward implementation of the greedy approach solves the first three sets of inputs. It still takes
time because the string equality test (to determine whether a chunk of size
can be chipped off) takes
.
Rolling hashes. As a final optimization step, we can speed up string comparison using rolling hashes. Let us denote the ASCII value of a character
with
, and choose a prime
larger than
, e.g.
. We can then define the hash of a sequence of characters
as
, where
is another suitably large prime. As we grow our candidate prefix and suffix, looking for the first matching pair, we only perform the expensive string comparison when the hash values of the prefix and the suffix match. These hash values can be updated in
for each additional character, and strings can be compared in approximately
(neglecting the cases where hash comparison yields a false positive), thus roughly
for the input overall.
Deterministic time complexity. A greedy solution with naive string equality testing is not that bad as long as the chunks are small enough. On the other hand, we could process the entire remaining string to find the smallest chunk in
. We can use the z-algorithm or KMP's failure function for this purpose. But that would again waste a lot of time if the smallest chunk turns out to be small. However, we can make a compromise. Choose a length threshold
and use naive equality testing for
. If that doesn't yield a solution, process the entire string. Worst case time complexity is
. Choosing
gives an
solution.
A deterministic linear-time solution also exists, although we are not aware of one that is practical to implement in a contest. Let us build a suffix array
and store for every suffix
its index
in the suffix array. Assume that we have already chipped off
characters on each side of
, and that we are trying to determine if
is a good size for the next prefix and suffix; in other words, if
. The size of a chunk
defines a range
of relevant suffixes in the suffix array and
is a good size exactly when
. The bottleneck of this solution is in adjusting the bounds of the range
and
when increasing
. Building a suffix tree to traverse and determine bounds
and
instead of binary searching over a suffix array leads to a linear-time solution.
Comments