Editorial for COCI '12 Contest 1 #6 Mars
Submitting an official solution before solving the problem yourself is a bannable offence.
We will describe two solutions based on dynamic programming.
For the purposes of both solutions, we will call sets of positions in the sequence  the 
-structures, sets 
 the 
-structures, sets 
 the 
-structures and so on. Two bacteria in the same 
-structure will be called siblings; if they are not siblings, but are in the same 
-structure, we'll call them first cousins; if they are neither siblings nor first cousins, we'll call them second cousins etc.
The first solution
(devised by Luka Kalinovčić while solving COCI)
Imagine that we are adding bacteria into the sequence from left to right; then the state is described by the position in the sequence and the number of the bacterium placed in that position.
At first glance, these two parameters appear insufficient, but it can be shown that they are in fact sufficient. Let  and 
 be two adjacent positions in the sequence, with a bacterium marked prev placed in position 
. Let us show that we can, without knowledge about the rest of the sequence constructed before position 
, unambiguously determine which bacteria are candidates for placement in position 
 (thus giving us the set of possible following states, making the state transition in dynamic programming possible). With that goal, let us consider the following cases:
- 1) 
and
are in the same
-structure. It is clear that the only candidate for
is the sibling of prev.
 - 2) Case 1 doesn't apply, but 
and
are in the same
-structure. Here candidates for position
are the two first cousins of prev.
 - 3) None of the previous cases apply, but 
and
are in the same
-structure. Here candidates for position
are the four second cousins of prev.
 - …
 ) None of the previous cases apply, but
and
are in the same
-structure (spanning all positions). Here candidates for position
are the
most distant cousins of prev.
Notice that this reasoning leads to the exact set of candidates for the next position, since none of the candidates could have been used before (take a moment to think and convince yourself).
The complexity of the algorithm is the number of states, , multiplied by the transition complexity (number of candidates), which, at first glance, totals 
. Since the number of candidates is 
 in 
 of cases, 
 in 
 of cases, 
 in 
 of cases and so on, the actual complexity is even lower: 
.
The second solution
Let  be the smallest possible length of a subsequence starting with bacterium 
, ending with bacterium 
, and including precisely the bacteria which are descendants of the first (lowest) common ancestor of 
 and 
 (denoted by 
). The agenda is as follows:
- 1) Compute 
for all
that are siblings.
 - 2) Compute 
for all
that are first cousins.
 - 3) Compute 
for all
that are second cousins.
 - …
 ) Compute
for all
that are most distant cousins.
The final result will, of course, be the smallest of the numbers obtained in the last,  step, since they cover all sequences containing all 
 bacteria.
Let us denote with  the repulsion between bacteria 
 and 
. 
 values in the first step are trivial to obtain, since 
. Other steps are a bit more complex, so bear with us.
For given  and 
, we can try out all selections of bacteria 
 and 
 that "divide structures 
 and 
", i.e. are exactly in the middle of the subsequence we're building between 
 and 
, where 
 is the cousin closer to 
 than 
, while 
 is the cousin closer to 
 than 
. They cannot be too close cousins, since we have to fit half of all descendants of 
 between 
 and 
, and analogously for 
 – in other words, 
 and 
 must be the two children of 
.
For fixed , and a selection of bacteria 
 and 
, the smallest possible length of the segment between 
 and 
 is obviously:
Therefore,  is equal to the minimum of this expression over all valid choices of bacteria 
 and 
.
We need to find a fast method of computing this transition. If, for given  and 
, we select some bacterium 
, we need to minimize:
for all legal choices of . Notice that this minimum doesn't depend on 
. Let us denote it by 
.
We can thus, before computing values of  for the current step, precompute values of 
 needed for that step (where 
 is obtained by trying out all valid values of 
) and then use the 
 values to obtain 
 values for that step more efficiently.
The complexity is obtained by summing the complexity for computing  values and for computing 
 values. Both are at most 
 since they compute 
 values, each of which requires one for loop. Since this loop is often short because of relatively few valid choices, mathematically inclined readers are encouraged to determine whether this complexity is really 
 or somewhat smaller, as in the previous solution.
Comments