Editorial for IOI '11 P6 - Parrots
Submitting an official solution before solving the problem yourself is a bannable offence.
For this task, contestants are given a message  of length 
, which consists of a sequence of integers between 
 and 
, inclusive, and they are asked to find an encoding and decoding scheme such that the encoded message 
 are invariant up to permutations. The encoded message should be as concise as possible. The scores will be rewarded based on the ratio of the length of the encoded message 
 to the length of the original message 
.

There are many ways to solve this task. It will be instructive to look at the first few subtasks.
Subtask 1
For this subtask, recall that the original message  is just a sequence of two possible numbers, 
 or 
. Assuming that the original message is indexed from 
 to 
, one can instead choose to send the positions in the original message where all number 
's are located. This technique uses at most 
 integers to send the message.
Subtask 2
For subtask 2, since the numbers in the original message  could be from 
 to 
, they can be encoded using only 
 bits. However, this subtask allows one to use up to 
 bits per number in the encoded message 
. Hence, you can encode the position of each number using the higher 
 bits (similar to the solution to subtask 1), and use the lower 
 bits to encode the actual number in the original message 
, i.e.
where  denotes the 
 number in the original message.
Note that the encoded numbers are also ordered, i.e. , for 
. Therefore, when we would like to decode the message from the shuffled array 
, we can sort it to get 
. Then, to obtain the original message from the sorted 
 should be pretty obvious.
This method also uses at most  parrots, which is satisfactory for this subtask.
Subtask 3
Thinking about the first two subtasks should get us familiar with the approach. There are many roads to take from here. However, to formalize the settings, we shall think of a general representation first.
Because we shall keep integers in the encoded array, a natural representation would be to use a sorted sequence. Note that this representation is very robust against permutation; provided that the encoded array  is sorted, by sorting a shuffled array 
, we can easily recover the encoded array 
, as in subtask 2.
We shall try to extend the approach in subtask 2. However, for subtasks 3–5, we do not have "extra" bits for keeping the position data. In these cases, we look at the original message as a sequence of binary digits (or bits),  and 
, by representing each number using only 
 bits. Thus, the original message of length 
 will have 
 bits.
This leads to a way to solve subtask 3. Note that the original message  contains at most 
 bits. For now, let 
 be the 
 bit of the original message, indexing from 
 to 
. In this case, it is sufficient to take 
 bits to represent all positions of every single bit 
, and use one last bit to represent the actual data. Hence, the encoding scheme looks like the following:
therefore, it takes  bits to represent possible positions and another bit to keep the data. That is, for each bit 
, we shall encode it as 
. The length of the encoded message for each test case of this subtask is 
 bytes.
Subtask 4
For this subtask, there are  bit positions; and it requires 
 bits just to represent the positions. Hence, there is no way to include the information of each bit of the original message in the encoding. If we recall the technique we use to solve subtask 1, we can improve the technique we used in subtask 3 by choosing to encode the positions of all 
 bits. This approach again sends at most 
 bytes.
Subtask 5
For this subtask, contestants are allowed to obtain some partial scores, depending on the ratio of the encoding message to the original message. In our point of view, to obtain nearly perfect scores like  is relatively much easier than to obtain the full points. We consider the last 
 points to be the reward to those who choose to implement more sophisticated methods of encoding that achieve better results than our expectations.
We discuss many potential solutions that might achieve some partial scores.
A 
 approach
We are going to generalize the solution to the previous subtask case since the number of bits is more than . In this case, there are at most 
 bit positions.
To deal with this issue, we partition the message into  pairs of bits. We define the 
 bit pair 
 to be 
. Again, the bit pair 
 should be indexed from 
 to 
.
Note that there are  possible values for each bit pair, which can be represented with integers from 
 to 
. One possible method is to send the positions of the bit pair (which uses 
 bits to encode) in the encoding data, and then the actual bit pair data will be encoded as the number of occurrences of such positions.
Using this approach, for each pair of bits, we may have to send at most  bytes. Therefore, we send at most 
 bytes. Contestants will achieve 
 points on this subtask for implementing this method.
A 
 approach
With a simple observation, we can reduce the size of the encoded messages by roughly a factor of two. Note that the best case (in terms of encoding length) for the previous encoding method is when the original message only contains zero bits, and the worst case is when the message contains only one bits. We shall try to get the average of these two cases.
Consider two encoding schemes, called ONE and ZERO. For a pair of bits , the ONE scheme sends 
 copies of 
, but the ZERO scheme sends 
 copies. Note that both schemes work, provided that the decoder knows which of the schemes the encoder actually uses.
For each pair of bits, the sum of the number of encoded bytes used by both schemes is always  bytes. Considering all pairs, the sum is 
. Now, if we choose the scheme with the lower number of encoding data, we will need at most half of this number, i.e., at most 
 bytes.
There are many ways to signal the decoder which encoding scheme the encoder uses without too much penalty on the ratio. For example, we can add  copies of 
 iff the ZERO scheme is used. This will make the ratio go up by 
. However, since 
, the worst ratio is at most 
. Contestants will achieve 
 points on this subtask for implementing this method.
Better approaches
There are many other approaches that give a better compression ratio.
One byte to 4 numbers
Consider the number of encoded messages of length no greater than , only consisting of numbers 
, 
, 
, and 
 (
 is one such message). To analyze this, it might be easier to consider an equivalent scheme: encoded messages of length exactly 
, only consisting of numbers 
, 
, 
, 
, and BLANK. There are 
 of them, which is enough to encode one byte of the original message.
Since the original message is at most  bytes long, it is possible to allocate 
 numbers per byte (
, 
, 
, and 
 for the first byte, for example), and therefore numbers from 
 to 
 are sufficient. This scheme yields a compression ratio of at most 
.
Optimal ratio
Theoretically, one can encode  bytes of data to the minimum of 
 bytes of encoded message without loss of information. There are 
 different encoded messages using no more than 
 bytes. This is just more than 
, the number of different 
-byte original messages. The compression ratio achieved is about 
. For smaller data, the ratio is even smaller.
This method yields a perfect score for this task. However, implementation of this scheme is comparatively difficult.
Comments