Given a tree
with
nodes, this task asks for a path
of length
with the minimum number of edges. It looks like a usual dynamic programming task. However, when
is large, another approach is required.
The model solution for this task follows a divide-and-conquer approach.
Consider a node
in the graph. There are two possible cases: when node
belongs to the solution path
; or when node
does not.
In the second case, we can delete node
from the tree and break it into smaller trees. We can then recurse on each of the resulting trees to find the solution.
With this general approach in mind, we have to answer the following questions.
- How to find the best path that contains node
?
- How to choose
to achieve a better running time?
Note that the second question is very important because if we can guarantee that the sizes of all resulting trees are small, we can bound the number of recursion levels.
Finding the solution containing 
Consider the case that
contains node
. Let's consider a simpler case where we only want to find if there exists a path of length exactly
that contains
.
If
is one of the endpoints in
, we can find the path using one application of depth first search (DFS).
However, if
is "inside"
, then two of
's adjacent nodes
and
must also be in
. Thus, we shall find
and
.
Consider some node
adjacent to
. With one application of DFS, we can find the set
of all path lengths for all paths starting at
and containing edge
.
Hence, to find
and
, we need to find two nodes
and
such that there exists a pair
and
for which
. This can be done by DFS from
through every edge
for all adjacent nodes
with careful bookkeeping using an array
of size
.
The running time for this step is
.
Finding the right node
Our goal is to find node
such that after deleting
, all resulting trees are sufficiently "small." In this case, we shall find node
such that each remaining tree has at most
nodes. We shall refer to node
as the central node.
It is not clear if such a node exists. So let's argue about that first.
Pick an arbitrary node
as a candidate. Denote by
the forest obtained by deleting
from
. For each node
adjacent to
, denote by
the tree containing
in
. If every tree
has at most
nodes, we are done and
is the required central node.
Otherwise, there exists one tree
that contains more than
nodes. (Note that there can be only one tree violating our criteria.) In this case, we pick
as our new candidate and repeat the process.
This process will eventually stop at some candidate node and that's the required central node. To see this, note that after leaving
, we shall never go back to pick
again; since there are
nodes, the process can repeat at most
times.
After knowing that the central node exists, there are many ways to find it. We can follow the process directly as in the argument. But this is too slow to be useful.
The following are two procedures that find the central node in
time and
time.
Bottom-up approach
We can find node
in a bottom-up fashion. We shall keep a priority queue
of all "processed" subtrees using their sizes as weights.
We maintain, for each node, its state which can either be new or processed; initially all nodes are new. Every node also has a weight. Initially, every node has a weight of
.
We start with all leaf nodes in
. Note that each node in
is every node which has all but one of its adjacent nodes processed. For each node
, we denote by
the unique neighbor of
which is new.
While there are nodes in
, we extract node
with the smallest weight. We update
's state to processed and increase the weight of
by
's weight. If all but one neighbor of
are processed, we insert
into
.
Using this procedure, the last node inserted to
is the desired central node.
DFS with bookkeeping
With DFS and good bookkeeping, we can find the central node in
.
We pick an arbitrary node
to start a DFS. With this procedure, we can consider
as rooted at
and the parent-child relationship between adjacent pairs of nodes are clearly defined. While performing DFS, we compute, for each node
, the number of its descendants
.
With this information, we can figure out if a candidate
is the central node. For each node
adjacent to
, if
is one of
's children, the size of the resulting tree containing
after deleting
is
. If
is
's parent, the size of the resulting tree containing
after deleting
is:

where
is the set of children of
. If the size of each resulting tree is at most
,
is the desired central node. The time needed to check
is proportional to
's degree. Therefore, we can check all nodes in time
.
Running time
Let
be the worst-case running time when the tree has
nodes. We can write the recurrence as:

where
is the time for finding
,
is the size of the
new trees, and
is some constant.
Since we know that
, there are at most
levels of the recursion.
If we use
-time to find
, each level would run in time
and the total running time is
. If we use a slower
-time procedure, the total running time will be
.
Notes
There are other heuristics for finding
that do not always work. Here are some examples.
- In a divide-and-conquer solution, the highest degree node is picked.
- In a divide-and-conquer solution, the node that minimizes the maximum distance to any node is picked.
Comments