Editorial for COCI '12 Contest 1 #6 Mars
Submitting an official solution before solving the problem yourself is a bannable offence.
We will describe two solutions based on dynamic programming.
For the purposes of both solutions, we will call sets of positions in the sequence the -structures, sets the -structures, sets the -structures and so on. Two bacteria in the same -structure will be called siblings; if they are not siblings, but are in the same -structure, we'll call them first cousins; if they are neither siblings nor first cousins, we'll call them second cousins etc.
The first solution
(devised by Luka Kalinovčić while solving COCI)
Imagine that we are adding bacteria into the sequence from left to right; then the state is described by the position in the sequence and the number of the bacterium placed in that position.
At first glance, these two parameters appear insufficient, but it can be shown that they are in fact sufficient. Let and be two adjacent positions in the sequence, with a bacterium marked prev placed in position . Let us show that we can, without knowledge about the rest of the sequence constructed before position , unambiguously determine which bacteria are candidates for placement in position (thus giving us the set of possible following states, making the state transition in dynamic programming possible). With that goal, let us consider the following cases:
- 1) and are in the same -structure. It is clear that the only candidate for is the sibling of prev.
- 2) Case 1 doesn't apply, but and are in the same -structure. Here candidates for position are the two first cousins of prev.
- 3) None of the previous cases apply, but and are in the same -structure. Here candidates for position are the four second cousins of prev.
- …
- ) None of the previous cases apply, but and are in the same -structure (spanning all positions). Here candidates for position are the most distant cousins of prev.
Notice that this reasoning leads to the exact set of candidates for the next position, since none of the candidates could have been used before (take a moment to think and convince yourself).
The complexity of the algorithm is the number of states, , multiplied by the transition complexity (number of candidates), which, at first glance, totals . Since the number of candidates is in of cases, in of cases, in of cases and so on, the actual complexity is even lower: .
The second solution
Let be the smallest possible length of a subsequence starting with bacterium , ending with bacterium , and including precisely the bacteria which are descendants of the first (lowest) common ancestor of and (denoted by ). The agenda is as follows:
- 1) Compute for all that are siblings.
- 2) Compute for all that are first cousins.
- 3) Compute for all that are second cousins.
- …
- ) Compute for all that are most distant cousins.
The final result will, of course, be the smallest of the numbers obtained in the last, step, since they cover all sequences containing all bacteria.
Let us denote with the repulsion between bacteria and . values in the first step are trivial to obtain, since . Other steps are a bit more complex, so bear with us.
For given and , we can try out all selections of bacteria and that "divide structures and ", i.e. are exactly in the middle of the subsequence we're building between and , where is the cousin closer to than , while is the cousin closer to than . They cannot be too close cousins, since we have to fit half of all descendants of between and , and analogously for – in other words, and must be the two children of .
For fixed , and a selection of bacteria and , the smallest possible length of the segment between and is obviously:
Therefore, is equal to the minimum of this expression over all valid choices of bacteria and .
We need to find a fast method of computing this transition. If, for given and , we select some bacterium , we need to minimize:
for all legal choices of . Notice that this minimum doesn't depend on . Let us denote it by .
We can thus, before computing values of for the current step, precompute values of needed for that step (where is obtained by trying out all valid values of ) and then use the values to obtain values for that step more efficiently.
The complexity is obtained by summing the complexity for computing values and for computing values. Both are at most since they compute values, each of which requires one for loop. Since this loop is often short because of relatively few valid choices, mathematically inclined readers are encouraged to determine whether this complexity is really or somewhat smaller, as in the previous solution.
Comments