Editorial for IOI '11 P2 - Race - Olympiads Online Judge

Editorial for IOI '11 P2 - Race

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Given a tree $T$ ~T~ with $N$ ~N~ nodes, this task asks for a path $P^*$ ~P^*~ of length $K$ ~K~ with the minimum number of edges. It looks like a usual dynamic programming task. However, when $K$ ~K~ is large, another approach is required.

The model solution for this task follows a divide-and-conquer approach.

Consider a node $u$ ~u~ in the graph. There are two possible cases: when node $u$ ~u~ belongs to the solution path $P^*$ ~P^*~; or when node $u$ ~u~ does not.

In the second case, we can delete node $u$ ~u~ from the tree and break it into smaller trees. We can then recurse on each of the resulting trees to find the solution.

With this general approach in mind, we have to answer the following questions.

How to find the best path that contains node $u$ ~u~?
How to choose $u$ ~u~ to achieve a better running time?

Note that the second question is very important because if we can guarantee that the sizes of all resulting trees are small, we can bound the number of recursion levels.

Finding the solution containing $u$ ~u~

Consider the case that $P^*$ ~P^*~ contains node $u$ ~u~. Let's consider a simpler case where we only want to find if there exists a path of length exactly $K$ ~K~ that contains $u$ ~u~.

If $u$ ~u~ is one of the endpoints in $P^*$ ~P^*~, we can find the path using one application of depth first search (DFS).

However, if $u$ ~u~ is "inside" $P^*$ ~P^*~, then two of $u$ ~u~'s adjacent nodes $x$ ~x~ and $y$ ~y~ must also be in $P^*$ ~P^*~. Thus, we shall find $x$ ~x~ and $y$ ~y~.

Consider some node $w$ ~w~ adjacent to $u$ ~u~. With one application of DFS, we can find the set $L_w$ ~L_w~ of all path lengths for all paths starting at $u$ ~u~ and containing edge $(u, w)$ ~(u, w)~.

Hence, to find $x$ ~x~ and $y$ ~y~, we need to find two nodes $x$ ~x~ and $y$ ~y~ such that there exists a pair $\ell_x \in L_x$ ~\ell_x \in L_x~ and $\ell_y \in L_y$ ~\ell_y \in L_y~ for which $\ell_x+\ell_y = K$ ~\ell_x+\ell_y = K~. This can be done by DFS from $u$ ~u~ through every edge $(u, w)$ ~(u, w)~ for all adjacent nodes $w$ ~w~ with careful bookkeeping using an array $A[0, \dots, K]$ ~A[0, \dots, K]~ of size $K+1$ ~K+1~.

The running time for this step is $\mathcal O(N)$ ~\mathcal O(N)~.

Finding the right node

Our goal is to find node $u$ ~u~ such that after deleting $u$ ~u~, all resulting trees are sufficiently "small." In this case, we shall find node $u$ ~u~ such that each remaining tree has at most $N/2$ ~N/2~ nodes. We shall refer to node $u$ ~u~ as the central node.

It is not clear if such a node exists. So let's argue about that first.

Pick an arbitrary node $v$ ~v~ as a candidate. Denote by $T' = T \setminus \{v\}$ ~T' = T \setminus \{v\}~ the forest obtained by deleting $v$ ~v~ from $T$ ~T~. For each node $w$ ~w~ adjacent to $v$ ~v~, denote by $T_w$ ~T_w~ the tree containing $w$ ~w~ in $T'$ ~T'~. If every tree $T_w \in T'$ ~T_w \in T'~ has at most $N/2$ ~N/2~ nodes, we are done and $v$ ~v~ is the required central node.

Otherwise, there exists one tree $T_w$ ~T_w~ that contains more than $N/2$ ~N/2~ nodes. (Note that there can be only one tree violating our criteria.) In this case, we pick $w$ ~w~ as our new candidate and repeat the process.

This process will eventually stop at some candidate node and that's the required central node. To see this, note that after leaving $v$ ~v~, we shall never go back to pick $v$ ~v~ again; since there are $N$ ~N~ nodes, the process can repeat at most $N$ ~N~ times.

After knowing that the central node exists, there are many ways to find it. We can follow the process directly as in the argument. But this is too slow to be useful.

The following are two procedures that find the central node in $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~ time and $\mathcal O(N)$ ~\mathcal O(N)~ time.

Bottom-up approach

We can find node $u$ ~u~ in a bottom-up fashion. We shall keep a priority queue $Q$ ~Q~ of all "processed" subtrees using their sizes as weights.

We maintain, for each node, its state which can either be new or processed; initially all nodes are new. Every node also has a weight. Initially, every node has a weight of $1$ ~1~.

We start with all leaf nodes in $Q$ ~Q~. Note that each node in $Q$ ~Q~ is every node which has all but one of its adjacent nodes processed. For each node $v \in Q$ ~v \in Q~, we denote by $p(v)$ ~p(v)~ the unique neighbor of $v$ ~v~ which is new.

While there are nodes in $Q$ ~Q~, we extract node $v$ ~v~ with the smallest weight. We update $v$ ~v~'s state to processed and increase the weight of $p(v)$ ~p(v)~ by $v$ ~v~'s weight. If all but one neighbor of $p(v)$ ~p(v)~ are processed, we insert $v$ ~v~ into $Q$ ~Q~.

Using this procedure, the last node inserted to $Q$ ~Q~ is the desired central node.

DFS with bookkeeping

With DFS and good bookkeeping, we can find the central node in $\mathcal O(N)$ ~\mathcal O(N)~.

We pick an arbitrary node $r$ ~r~ to start a DFS. With this procedure, we can consider $T$ ~T~ as rooted at $r$ ~r~ and the parent-child relationship between adjacent pairs of nodes are clearly defined. While performing DFS, we compute, for each node $v$ ~v~, the number of its descendants $D(v)$ ~D(v)~.

With this information, we can figure out if a candidate $u$ ~u~ is the central node. For each node $w$ ~w~ adjacent to $u$ ~u~, if $w$ ~w~ is one of $u$ ~u~'s children, the size of the resulting tree containing $w$ ~w~ after deleting $u$ ~u~ is $D(w)+1$ ~D(w)+1~. If $w$ ~w~ is $u$ ~u~'s parent, the size of the resulting tree containing $w$ ~w~ after deleting $u$ ~u~ is:

$\displaystyle n-1-\sum_{v \in Ch(u)} (D(v)+1)$

where $Ch(u)$ ~Ch(u)~ is the set of children of $u$ ~u~. If the size of each resulting tree is at most $N/2$ ~N/2~, $u$ ~u~ is the desired central node. The time needed to check $u$ ~u~ is proportional to $u$ ~u~'s degree. Therefore, we can check all nodes in time $\mathcal O(N)$ ~\mathcal O(N)~.

Running time

Let $\mathcal T(N)$ ~\mathcal T(N)~ be the worst-case running time when the tree has $N$ ~N~ nodes. We can write the recurrence as:

$\displaystyle \mathcal T(N) = A(N)+cN+\sum_i \mathcal T(N_i)$

where $A(N)$ ~A(N)~ is the time for finding $u$ ~u~, $N_i$ ~N_i~ is the size of the $i^\text{th}$ ~i^\text{th}~ new trees, and $c$ ~c~ is some constant.

Since we know that $N_i \le N/2$ ~N_i \le N/2~, there are at most $\mathcal O(\log N)$ ~\mathcal O(\log N)~ levels of the recursion.

If we use $\mathcal O(N)$ ~\mathcal O(N)~-time to find $u$ ~u~, each level would run in time $\mathcal O(N)$ ~\mathcal O(N)~ and the total running time is $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~. If we use a slower $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~-time procedure, the total running time will be $\mathcal O(N \log^2 N)$ ~\mathcal O(N \log^2 N)~.

Notes

There are other heuristics for finding $u$ ~u~ that do not always work. Here are some examples.

In a divide-and-conquer solution, the highest degree node is picked.
In a divide-and-conquer solution, the node that minimizes the maximum distance to any node is picked.

Comments

There are no comments at the moment.