Editorial for Baltic OI '06 P5 - RLE Compression
Submitting an official solution before solving the problem yourself is a bannable offence.
The optimal solution should work in .
First, we decode the input and store the sequence as a sequence of blocks. Each block represents a repetition of one character. Let the number of blocks be . It satisfies 
.
A straightforward dynamic solution leads to time and memory . For each 
 and each 
 we calculate a value 
 — the length of the shortest code of first 
 blocks such that at the end of the code the special character is set to 
.
This algorithm can be accelerated using the following observations. First, the code of a block of repetition of a character  using the special character other than 
 is not longer than the code of the block using the special character 
. Therefore, if 
 then it is always a good choice to code the block using special character 
, thus 
, where 
 is the length of the shortest code of block 
 given 
. So all values of 
 for 
 differ from 
 by the same number 
.
More attention is needed when calculating . We want to encode block 
 in a such way that after encoding it, the special character will be set to 
. This can be done in two ways: with or without switching the special character. If we don't want to switch, then the length of the code will be equal to 
 plus the length of the code of block 
 using 
 as the special character. If we want to switch the special character to 
 somewhere, it is always worthwhile to do it at the end of block of 
. This costs additional 
 characters. The rest of the code will contain 
 character plus the smallest value from the set 
. To reconstruct later the code we need to save only how this particular value 
 was calculated.
We don't have to store  for all 
. We need only values 
 for the current block.
There can be developed a fast data structure with the following operations running in constant time:
- getting the value 
for any
,
 - calculation of 
from
,
 - getting the smallest value 
with
.
 
The key observation for doing it is that two values  may differ at most by 
. This can be proved by a simple induction on the number of blocks.
The task requires from a contestant some short insight into the problem. The more difficult part of the solution should be careful implementation.
Comments