Editorial for CEOI '17 P5 - Palindromic Partitions - Olympiads Online Judge

Editorial for CEOI '17 P5 - Palindromic Partitions

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Let us think about the problem of creating an optimal (i.e. longest) chunking in terms of an iterative process of chipping off equal-sized chunks from both sides of the input string $s$ ~s~.

Exhaustive search. How large should these chipped-off prefixes and suffixes be at each step? Naively, we can try chipping off every possible length $k$ ~k~, then recursively figure out the optimal chunking for the remainder:

$\displaystyle solution(s) = \max_{k \in 1 \dots |n|/2} \begin{cases} 2+solution(s[k:-k]) & \text{if }s[:k] = s[-k:] \\ 1 & \text{otherwise} \end{cases}$

The expression above uses Python-like notation: $s[:k]$ ~s[:k]~ is the prefix of length $k$ ~k~ of $s$ ~s~, $s[-k:]$ ~s[-k:]~ is the suffix, and $s[k:-k]$ ~s[k:-k]~ is everything but the prefix and suffix.

Dynamic programming. This solution takes time exponential in the length of the string, and will only solve the easiest set of inputs. However, note that the input to $solution()$ ~solution()~ is fully defined just by its length. It follows that there are only $\mathcal O(n)$ ~\mathcal O(n)~ possible different calls to $solution()$ ~solution()~, and also that we can pass that length, rather than the actual substring of $s$ ~s~, to $solution()$ ~solution()~. If we memoize the function outputs, this results in an $\mathcal O(n^3)$ ~\mathcal O(n^3)~ runtime. This (or the equivalent dynamic programming solution without memoization) solves the first two sets of inputs.

Greedy. We do not have to consider all values of $k$ ~k~, however. A greedy approach, always chipping off the smallest possible chunk, gives optimal results. Let us sketch the proof (in the interest of space, notation is not fully formal).

Without loss of generality, assume that the greedy algorithm and the optimal algorithm differ in the first chunk they chip off the input string $s$ ~s~: the greedy algorithm selects a chunk $C$ ~C~, and the optimal algorithm selects a strictly larger chunk $C^*$ ~C^*~, $|C^*| > |C^0|$ ~|C^*| > |C^0|~. Let us denote the prefix/suffix as $C^* = CA$ ~C^* = CA~ and $C^* = BC$ ~C^* = BC~:

We consider two cases:

$|B| \ge |C|$ ~|B| \ge |C|~.

Equating $CA = BC$ ~CA = BC~ gives us $C^* = CXC$ ~C^* = CXC~, where $X$ ~X~ is possibly an empty string. So chunking the prefix/suffix $C^*$ ~C^*~ as $(C)(X)(C)$ ~(C)(X)(C)~ yields a longer palindrome than the "optimal" chunking of $(C^*)$ ~(C^*)~. Contradiction.
$|B| < |C|$ ~|B| < |C|~.

Equating $CA = BC$ ~CA = BC~ and taking into account $|B| < |C|$ ~|B| < |C|~ gives us $C = BC^{(1)}$ ~C = BC^{(1)}~ for some prefix $C^{(1)}$ ~C^{(1)}~ of $C$ ~C~; note that $C^{(1)}$ ~C^{(1)}~ is both a prefix and a suffix of $C$ ~C~. We can keep applying the same logic to get $C = BC^{(1)} = BBC^{(2)} = BBBC^{(3)} = \dots = BB \dots BBC^{(k)}$ , until $|C^{(k)}| \le |B|$ ~|C^{(k)}| \le |B|~. $C^{(k)}$ ~C^{(k)}~ is both a prefix and a suffix of $C$ ~C~ (by the same logic that $C^{(1)}$ ~C^{(1)}~ was), and thus of $C^*$ ~C^*~. This prefix and suffix do not overlap, because $|C^{(k)}| \le |B| < |C|/2 < |C^*|/2$ ~|C^{(k)}| \le |B| < |C|/2 < |C^*|/2~. So $C^*$ ~C^*~ can be written as $C^{(k)}XC^{(k)}$ ~C^{(k)}XC^{(k)}~ for some $X$ ~X~, and could be chunked more finely. Contradiction as in case 1.

Since both cases lead to a contradiction, we conclude that the greedy solution is optimal.

A straightforward implementation of the greedy approach solves the first three sets of inputs. It still takes $\mathcal O(n^2)$ ~\mathcal O(n^2)~ time because the string equality test (to determine whether a chunk of size $k$ ~k~ can be chipped off) takes $\mathcal O(n)$ ~\mathcal O(n)~.

Rolling hashes. As a final optimization step, we can speed up string comparison using rolling hashes. Let us denote the ASCII value of a character $c$ ~c~ with $A(c)$ ~A(c)~, and choose a prime $p$ ~p~ larger than $A(\texttt z)$ ~A(\texttt z)~, e.g. $p = 131$ ~p = 131~. We can then define the hash of a sequence of characters $c_0 c_1 \dots c_m$ ~c_0 c_1 \dots c_m~ as $h(c_1 c_2 \dots c_m) = \sum_{i=0}^m A(c_i) p^i \bmod M$ , where $M$ ~M~ is another suitably large prime. As we grow our candidate prefix and suffix, looking for the first matching pair, we only perform the expensive string comparison when the hash values of the prefix and the suffix match. These hash values can be updated in $\mathcal O(1)$ ~\mathcal O(1)~ for each additional character, and strings can be compared in approximately $\mathcal O(1)$ ~\mathcal O(1)~ (neglecting the cases where hash comparison yields a false positive), thus roughly $\mathcal O(n)$ ~\mathcal O(n)~ for the input overall.

Deterministic time complexity. A greedy solution with naive string equality testing is not that bad as long as the chunks are small enough. On the other hand, we could process the entire remaining string to find the smallest chunk in $\mathcal O(n)$ ~\mathcal O(n)~. We can use the z-algorithm or KMP's failure function for this purpose. But that would again waste a lot of time if the smallest chunk turns out to be small. However, we can make a compromise. Choose a length threshold $t$ ~t~ and use naive equality testing for $k < t$ ~k < t~. If that doesn't yield a solution, process the entire string. Worst case time complexity is $\mathcal O((k^2 + n) \frac n k)$ ~\mathcal O((k^2 + n) \frac n k)~. Choosing $k = \sqrt n$ ~k = \sqrt n~ gives an $\mathcal O(n^{1.5})$ ~\mathcal O(n^{1.5})~ solution.

A deterministic linear-time solution also exists, although we are not aware of one that is practical to implement in a contest. Let us build a suffix array $a$ ~a~ and store for every suffix $s[i:]$ ~s[i:]~ its index $p[i]$ ~p[i]~ in the suffix array. Assume that we have already chipped off $i$ ~i~ characters on each side of $s$ ~s~, and that we are trying to determine if $k$ ~k~ is a good size for the next prefix and suffix; in other words, if $s[i:i+k] = s[n-i-k:n-i]$ ~s[i:i+k] = s[n-i-k:n-i]~. The size of a chunk $k$ ~k~ defines a range $a[l] \dots a[r]$ ~a[l] \dots a[r]~ of relevant suffixes in the suffix array and $k$ ~k~ is a good size exactly when $l \le p[n-i-k] \le r$ ~l \le p[n-i-k] \le r~. The bottleneck of this solution is in adjusting the bounds of the range $l$ ~l~ and $r$ ~r~ when increasing $k$ ~k~. Building a suffix tree to traverse and determine bounds $l$ ~l~ and $r$ ~r~ instead of binary searching over a suffix array leads to a linear-time solution.

Comments

There are no comments at the moment.