Editorial for IOI '09 P7 - Regions
Submitting an official solution before solving the problem yourself is a bannable offence.
Although the employees are already assigned numbers in the input, the numbers can be reassigned in a way that makes them more useful. The supervisor relationships clearly organize the employees into a tree. Assign the new employee numbers in a pre-order walk of the tree3. Figure 1 shows an example of such a numbering.
3 A pre-order walk of a tree first processes the root of that tree, then recursively processes each sub-tree in turn.
Figure 1: An example of numbering employees by a pre-order walk. The bottom numbers indicate the range of employee numbers in each sub-tree.
A useful property of this numbering is that all the employees in a sub-tree have sequential numbers. For a given employee , let 
 be the range of employee numbers managed by 
. Notice that for a region, we can construct an ordered array of all the interval end-points for that region, and a list of all employees in that region. This can be done during the assignment of numbers in linear time.
Now let us consider how to answer queries . Let the sizes of the regions be 
 and 
 respectively. Given this data structure, a natural solution is to consider every pair of employees 
 from these regions and check whether 
 lies in the interval 
. However, this will take 
 time per query, which we can improve upon.
The interval end-points for region  divide the integers into contiguous blocks. All employees in the same block have the same managers from 
, and we can precompute the number of such managers for each such block. This gives us a faster way to answer queries. Rather than comparing every employee in 
 with every block for 
, we can observe that both are ordered by employee ID. Thus, one can maintain an index into each list, and in each step advance whichever index is lagging behind the other. Since each index traverses a list once, this takes 
 time.
Using just this query mechanism can still take  time, because all the queries might involve large regions. However, it is sufficient to earn the points for the tests where no region has more than 
 employees.
Precomputing queries
In the query algorithm above, it is also possible to replace the list of employees in  with the entire list of employees, and thus compute the answer to all queries for a particular 
. This still requires only a single pass over the blocks for 
, so it takes 
 time to produce all the answers for a particular 
. Similarly, one can iterate over all interval end-points while fixing 
, giving all answers for a particular 
.
This allows all possible queries to be pre-computed in  time and 
 memory. This is sufficient to earn the points for the tests where 
.
This algorithm is too slow and uses too much memory to solve all the tests. However, it is not necessary to precompute all answers, just the most expensive ones. We will precompute the answers involving regions with size at least . There are obviously at most 
 such regions, so this will take 
 time and 
 memory. The remaining queries involve only small regions, so they can be answered in 
 time. Choosing 
 gives 
 time and 
 memory, which is sufficient for a full score.
Caching queries
As an alternative to precomputation, one can cache the results of all queries, and take the answer from the cache if the same query is made again. Let  be the number of unique queries. The cost of maintaining the query cache depends on the data structure used; a balanced binary tree gives 
 overhead for this.
Combining the cache with the  algorithm is sufficient to achieve the points for tests that have either no more than 
 employees per region (because this is the case even without the cache), as well as the cases with no more than 
 regions (since the total cost of all distinct queries together is 
).
To achieve a full score with a cache rather than precomputation, one must use a better method for answering queries. Suppose we have a block in , and wish to find all matching employees from 
. While we have previously relied on a linear walk over the employees from 
, we can instead use a binary search to find the start and end of the range in 
 time. This allows the entire query to be answered in 
 time. A similar transformation (binary searching the blocks for each employee in 
) gives 
 time for each query.
Now when answering each query, choose the best out of the , 
 and 
 query mechanisms. To establish an upper bound on run-time, we will make assumptions about which method is chosen to answer particular types of queries.
Again, divide the problem into large regions with at least  employees and the rest. For queries involving one of the large regions, use the 
 algorithm (where 
 and 
 are respectively the smaller and larger of 
 and 
). The caching of queries ensures that this contributes no more than 
 time. For the remaining queries, use an 
 algorithm. The smaller regions have at most 
 employees, so this contributes 
 time.
The optimal value of  occurs when the two parts account for equal time. Solving for this optimal 
 gives a bound of 
 for answering non-duplicate queries; combined with the cost for the query cache, this gives an algorithm with time complexity 
 and memory complexity 
.
The time bound is marginally worse than for the first solution, but in practical terms this solution runs at about the same speed and uses significantly less memory.
Comments