Editorial for COCI '12 Contest 1 #6 Mars

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

We will describe two solutions based on dynamic programming.

For the purposes of both solutions, we will call sets of positions in the sequence $\{1, 2\}, \{3, 4\}, \{5, 6\}, \dots$ ~\{1, 2\}, \{3, 4\}, \{5, 6\}, \dots~ the $2$ ~2~-structures, sets $\{1, 2, 3, 4\}, \{5, 6, 7, 8\}, \dots$ ~\{1, 2, 3, 4\}, \{5, 6, 7, 8\}, \dots~ the $4$ ~4~-structures, sets $\{1, 2, 3, 4, 5, 6, 7, 8\}, \dots$ ~\{1, 2, 3, 4, 5, 6, 7, 8\}, \dots~ the $8$ ~8~-structures and so on. Two bacteria in the same $2$ ~2~-structure will be called siblings; if they are not siblings, but are in the same $4$ ~4~-structure, we'll call them first cousins; if they are neither siblings nor first cousins, we'll call them second cousins etc.

The first solution

(devised by Luka Kalinovčić while solving COCI)

Imagine that we are adding bacteria into the sequence from left to right; then the state is described by the position in the sequence and the number of the bacterium placed in that position.

At first glance, these two parameters appear insufficient, but it can be shown that they are in fact sufficient. Let $p$ ~p~ and $p+1$ ~p+1~ be two adjacent positions in the sequence, with a bacterium marked prev placed in position $p$ ~p~. Let us show that we can, without knowledge about the rest of the sequence constructed before position $p$ ~p~, unambiguously determine which bacteria are candidates for placement in position $p+1$ ~p+1~ (thus giving us the set of possible following states, making the state transition in dynamic programming possible). With that goal, let us consider the following cases:

1) $p$ ~p~ and $p+1$ ~p+1~ are in the same $2$ ~2~-structure. It is clear that the only candidate for $p+1$ ~p+1~ is the sibling of prev.
2) Case 1 doesn't apply, but $p$ ~p~ and $p+1$ ~p+1~ are in the same $4$ ~4~-structure. Here candidates for position $p+1$ ~p+1~ are the two first cousins of prev.
3) None of the previous cases apply, but $p$ ~p~ and $p+1$ ~p+1~ are in the same $8$ ~8~-structure. Here candidates for position $p+1$ ~p+1~ are the four second cousins of prev.
…
$K$ ~K~) None of the previous cases apply, but $p$ ~p~ and $p+1$ ~p+1~ are in the same $2^K$ ~2^K~-structure (spanning all positions). Here candidates for position $p+1$ ~p+1~ are the $2^K-1$ ~2^K-1~ most distant cousins of prev.

Notice that this reasoning leads to the exact set of candidates for the next position, since none of the candidates could have been used before (take a moment to think and convince yourself).

The complexity of the algorithm is the number of states, $\mathcal O(N^2)$ ~\mathcal O(N^2)~, multiplied by the transition complexity (number of candidates), which, at first glance, totals $\mathcal O(N^3)$ ~\mathcal O(N^3)~. Since the number of candidates is $1$ ~1~ in $\frac 1 2$ ~\frac 1 2~ of cases, $2$ ~2~ in $\frac 1 4$ ~\frac 1 4~ of cases, $4$ ~4~ in $\frac 1 8$ ~\frac 1 8~ of cases and so on, the actual complexity is even lower: $\mathcal O(N^2 \log N)$ ~\mathcal O(N^2 \log N)~.

The second solution

Let $dp(L, R)$ ~dp(L, R)~ be the smallest possible length of a subsequence starting with bacterium $L$ ~L~, ending with bacterium $R$ ~R~, and including precisely the bacteria which are descendants of the first (lowest) common ancestor of $L$ ~L~ and $R$ ~R~ (denoted by $LCA(L, R)$ ~LCA(L, R)~). The agenda is as follows:

1) Compute $dp(L, R)$ ~dp(L, R)~ for all $L, R$ ~L, R~ that are siblings.
2) Compute $dp(L, R)$ ~dp(L, R)~ for all $L, R$ ~L, R~ that are first cousins.
3) Compute $dp(L, R)$ ~dp(L, R)~ for all $L, R$ ~L, R~ that are second cousins.
…
$K$ ~K~) Compute $dp(L, R)$ ~dp(L, R)~ for all $L, R$ ~L, R~ that are most distant cousins.

The final result will, of course, be the smallest of the numbers obtained in the last, $K^\text{th}$ ~K^\text{th}~ step, since they cover all sequences containing all $2^K$ ~2^K~ bacteria.

Let us denote with $rep(x, y)$ ~rep(x, y)~ the repulsion between bacteria $x$ ~x~ and $y$ ~y~. $dp$ ~dp~ values in the first step are trivial to obtain, since $dp(L, R) = rep(L, R)$ ~dp(L, R) = rep(L, R)~. Other steps are a bit more complex, so bear with us.

For given $L$ ~L~ and $R$ ~R~, we can try out all selections of bacteria $a$ ~a~ and $b$ ~b~ that "divide structures $L$ ~L~ and $R$ ~R~", i.e. are exactly in the middle of the subsequence we're building between $L$ ~L~ and $R$ ~R~, where $a$ ~a~ is the cousin closer to $L$ ~L~ than $R$ ~R~, while $b$ ~b~ is the cousin closer to $R$ ~R~ than $L$ ~L~. They cannot be too close cousins, since we have to fit half of all descendants of $LCA(L, R)$ ~LCA(L, R)~ between $L$ ~L~ and $a$ ~a~, and analogously for $b$ ~b~ – in other words, $LCA(L, a)$ ~LCA(L, a)~ and $LCA(b, R)$ ~LCA(b, R)~ must be the two children of $LCA(L, R)$ ~LCA(L, R)~.

For fixed $L, R$ ~L, R~, and a selection of bacteria $a$ ~a~ and $b$ ~b~, the smallest possible length of the segment between $L$ ~L~ and $R$ ~R~ is obviously:

$\displaystyle dp(L, a) + rep(a, b) + dp(b, R)$

Therefore, $dp(L, R)$ ~dp(L, R)~ is equal to the minimum of this expression over all valid choices of bacteria $a$ ~a~ and $b$ ~b~.

We need to find a fast method of computing this transition. If, for given $L$ ~L~ and $R$ ~R~, we select some bacterium $a$ ~a~, we need to minimize:

$\displaystyle rep(a, b) + dp(b, R)$ $$\displaystyle rep(a, b) + dp(b, R)$$

for all legal choices of $b$ ~b~. Notice that this minimum doesn't depend on $L$ ~L~. Let us denote it by $best(a, R)$ ~best(a, R)~.

We can thus, before computing values of $dp$ ~dp~ for the current step, precompute values of $best$ ~best~ needed for that step (where $best(a, R)$ ~best(a, R)~ is obtained by trying out all valid values of $b$ ~b~) and then use the $best$ ~best~ values to obtain $dp$ ~dp~ values for that step more efficiently.

The complexity is obtained by summing the complexity for computing $best$ ~best~ values and for computing $dp$ ~dp~ values. Both are at most $\mathcal O(N^3)$ ~\mathcal O(N^3)~ since they compute $\mathcal O(N^2)$ ~\mathcal O(N^2)~ values, each of which requires one for loop. Since this loop is often short because of relatively few valid choices, mathematically inclined readers are encouraged to determine whether this complexity is really $\mathcal O(N^3)$ ~\mathcal O(N^3)~ or somewhat smaller, as in the previous solution.

Comments

There are no comments at the moment.