Editorial for IOI '09 P8 - Salesman - Olympiads Online Judge

Editorial for IOI '09 P8 - Salesman

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

We'll start by considering only the case where no two fairs occur on the same day. Later we'll show how to modify our algorithm to incorporate fairs that occur on the same day.

The first polynomial solution

First we'll describe a fairly standard dynamic programming algorithm. We order the fairs according to the day when they take place. For each fair $i$ ~i~ we will compute the best profit $P_i$ ~P_i~ we can achieve immediately after visiting this fair.

To avoid special cases, we'll add dummy fairs $0$ ~0~ and $N+1$ ~N+1~ which both take place at the salesman's home, fair $0$ ~0~ being the first and fair $N+1$ ~N+1~ the last of all fairs. We can immediately tell that $P_0 = 0$ ~P_0 = 0~ and that $P_{N+1}$ ~P_{N+1}~ is the answer we are supposed to compute.

The values $P_1$ ~P_1~ to $P_{N+1}$ ~P_{N+1}~ can all be computed in order, using the same observation: we have to arrive from some fair, and we may pick which one it is.

Let $cost(x, y)$ ~cost(x, y)~ be the cost of travelling from point $x$ ~x~ to point $y$ ~y~ on the river. If $x \le y$ ~x \le y~, we have $cost(x, y) = (y-x)D$ ~cost(x, y) = (y-x)D~, otherwise we have $cost(x, y) = (x-y)U$ ~cost(x, y) = (x-y)U~.

We can then write:

$\displaystyle \forall i \in \{1, \dots, N+1\} : P_x = \max_{0 \le j < i} (P_j-cost(L_j, L_i))+M_i$

(Explanation: To compute $P_i$ ~P_i~, we pick the number $j$ ~j~ of the fair we visited immediately before fair $i$ ~i~. Immediately after fair $j$ ~j~ the best profit we could have was $P_j$ ~P_j~. We then have to travel to the location of the current fair, which costs us $cost(L_j, L_i)$ ~cost(L_j, L_i)~, and finally we visit fair $i$ ~i~ for a profit $M_i$ ~M_i~. To obtain the largest possible $P_i$ ~P_i~, we take the maximum over all possible choices of $j$ ~j~.)

The time complexity of this algorithm is $\mathcal O(N^2)$ ~\mathcal O(N^2)~, which is sufficient to solve the cases where all the input values are at most $5\,000$ ~5\,000~.

An improved solution

We will now improve the previous algorithm. Note that the profit $M_i$ ~M_i~ from visiting fair $i$ ~i~ is the same for all choices of $j$ ~j~. Thus, the optimal choice of $j$ ~j~ depends on the profits $P_0, \dots, P_{i-1}$ ~P_0, \dots, P_{i-1}~, the locations $L_0, \dots, L_{i-1}$ ~L_0, \dots, L_{i-1}~, and the location $L_i$ ~L_i~ of the current fair.

We can divide the fairs $0$ ~0~ to $i-1$ ~i-1~ into two groups: those upstream of $L_i$ ~L_i~, and those downstream. We can now divide our problem "find the optimal $j$ ~j~" into two subparts: "find the optimal choice for the previous fair upstream" and "find the optimal choice for the previous fair downstream".

Consider locating the optimal previous fair upstream of $L_i$ ~L_i~. If we were to change the value $L_i$ ~L_i~ (in such a way that it does not change which other fairs are upstream of fair $i$ ~i~), can it influence our choice? No, it can not. If we, for example, increase $L_i$ ~L_i~ by $\Delta$ ~\Delta~, this means that for each of the upstream fairs, the cost of travelling to fair $i$ ~i~ increases by the same amount: $D \Delta$ ~D \Delta~. Hence the optimal choice would remain the same.

We will now show a relatively simple data structure that will allow us to locate the optimal previous fair upstream of fair $i$ ~i~ in $\mathcal O(\log N)$ ~\mathcal O(\log N)~ time.

The data structure is commonly known as an interval tree. We can assign the fairs new labels according to their unique positions on the river. More precisely, let $l_f$ ~l_f~ be the number of fairs that are upstream of fair $f$ ~f~ (including those that occur after fair $f$ ~f~).

Our interval tree is a complete binary tree with $k$ ~k~ levels, where $k$ ~k~ is the smallest integer such that $2_{k-1} \ge N+2$ ~2_{k-1} \ge N+2~. Note that $k = \mathcal O(\log N)$ ~k = \mathcal O(\log N)~.

Leaves in this binary tree correspond to the fairs, and the order in which fairs are assigned to leaves is given by the values $l_i$ ~l_i~. That is, the leftmost leaf is the fair closest to the river source, the second leaf is the second-closest fair, and so on.

Now note that each node in our tree corresponds to an interval of fairs — hence the name "interval tree". In each node of the interval tree, we will store the answer to the following question: "Let $S$ ~S~ be the set of fairs that correspond to leaves in this subtree that were already processed. Supposing that I'm downstream from each of them, which one is the optimal choice?"

Given this information, we can easily determine the optimal choice for the next fair $i$ ~i~ in $\mathcal O(\log N)$ ~\mathcal O(\log N)~. And it is also easy to update the information in the tree after fair $i$ ~i~ was processed; this too can be done in $\mathcal O(\log N)$ ~\mathcal O(\log N)~.

In our solution we will, of course, have two interval trees: one for the direction upstream and one for the direction downstream. For each fair $i$ ~i~, we first make two queries to determine the best predecessor upstream and downstream, then we pick the better of those two choices, compute $P_i$ ~P_i~, and finally we update both interval trees.

Hence we process each fair in $\mathcal O(\log N)$ ~\mathcal O(\log N)~, leading to the total time complexity $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~.

Another equally good solution

In this section, we will show another solution with the same complexity, which uses an "ordered set" data structure only, and can easily be implemented in C++ using the set class.

As before, we will process the fairs one by one, ordered by the day on which they occur. Imagine a situation after we have processed some fairs. Let $a$ ~a~ and $b$ ~b~ be two fairs that we have already processed. We say that $a$ ~a~ is covered by $b$ ~b~ if $P_a \le P_b-cost(L_b, L_a)$ ~P_a \le P_b-cost(L_b, L_a)~.

In human words, $a$ ~a~ is covered by $b$ ~b~ if the strategy "visit fair $b$ ~b~ last and then move to the location of fair $a$ ~a~" is at least as good as the strategy "visit fair $a$ ~a~ last".

Once fair $a$ ~a~ is covered by some other fair $b$ ~b~, this fair will never be an optimal predecessor for any later fair. Fair $b$ ~b~ will always (regardless of the location of the later fair) be at least as good a choice as $a$ ~a~.

On the other hand, if a fair is currently not covered by any other fair, there are some locations on the river for which $b$ ~b~ would be the optimal predecessor — at least the location $L_b$ ~L_b~ and its immediate surroundings. We will call such fairs active.

In our solution, we will maintain the set of currently active fairs, ordered by their position on the river. We will use an "ordered set" data structure, most commonly implemented as a balanced binary tree.

It can easily be shown that for each active fair $f$ ~f~ there is an interval of the river where $f$ ~f~ is the optimal choice. These intervals are obviously disjoint (except possibly for their endpoints), and together they cover the entire river. And as the interval for $f$ ~f~ contains $f$ ~f~, the intervals are in the same order as their corresponding active fairs.

Hence whenever we are going to process a new fair $i$ ~i~, we only have to locate the closest active fairs upstream and downstream of $i$ ~i~ — one of these two must be the optimal choice.

After we process the fair $i$ ~i~ and compute $P_i$ ~P_i~, we have to update the set of active fairs. Clearly, $i$ ~i~ is now active, as we computed $P_i$ ~P_i~ by taking the best way of getting to $L_i$ ~L_i~, and then added a positive profit $M_i$ ~M_i~. We add it into the set of active fairs. But we are not done yet — $i$ ~i~ might now cover some of the previously active fairs. But these are easy to find: if neither of the immediate neighbours of $i$ ~i~ (in the set of active fairs) is covered by $i$ ~i~, we are obviously done. If some of them are covered by $i$ ~i~, erase them from the set and repeat the check again.

In this solution, each fair is inserted into the set of active fairs once, and is erased from the set at most once. In addition, when processing each fair we make one query to find the closest two active fairs. Each of these operations takes $\mathcal O(\log N)$ ~\mathcal O(\log N)~, hence the total time complexity is $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~.

Multiple fairs on the same day

First of all, note that we cannot process fairs that are on the same day one by one — because we must allow the salesman to visit them in a different order than the one we picked.

There may be many ways in which to visit the fairs on a given day. However, we don't need to consider all of them, just some subset that surely contains the optimal solution.

Suppose that we already picked some order in which to visit the fairs on a given day. Let $u$ ~u~ and $d$ ~d~ be the fairs furthest upstream and downstream we visit. We can then, obviously, visit all fairs between $u$ ~u~ and $v$ ~v~ as well, as we'll surely be travelling through their locations. And clearly to visit all of these fairs, it's enough to travel first to $u$ ~u~ and then from $u$ ~u~ to $v$ ~v~, or vice versa. We will only consider such paths.

We will process each day in two phases. In the first phase, we process each fair $i$ ~i~ separately, as if it were the only fair that day, and we determine a preliminary value $P_i$ ~P_i~ — the best profit we can have after coming to fair $i$ ~i~ from some fair on a previous day.

In the second phase, we will take travelling upstream or downstream into account. We will consider each direction separately. When processing a direction, we'll process the fairs in order, and for each of them we'll determine whether it is more profitable to start at this fair (i.e., use the value computed in the previous step) or to start sooner (i.e., use the optimal value computed for the previous fair in this step, subtract the cost of travel from that fair to this one, and add the profit from this fair).

For each fair $i$ ~i~, the actual value $P_i$ ~P_i~ is then equal to the larger of the two values we get for travelling upstream and downstream.

(DMOJ curator's note: I'm not sure which sections the following paragraph is referring to)

Finally, we need to update the set of active fairs. When using an interval tree data structure as in Section , this is accomplished simply by adding each fair to the upstream and downstream interval trees. When using an ordered set as in Section , one must take a little more care, as not all of the fairs that we have just processed will be active. This is easily catered for by modifying our update process — before inserting a new active fair, we check that the fair is actually active by examining its potential neighbours in the data structure. If either of the neighbouring fairs covers the one being added, then it is not active, and so should not be added to the active fairs set. With this modification, we can update the entire data structure by sequentially attempting to add each fair (in any order).

Clearly, the additional time needed to process the second phase on any day is linear in the number of fairs that day, assuming we already have them sorted according to their location (which is easily accomplished by adding this as a tiebreaker to the comparison function used to sort all fairs in the beginning). Furthermore, the update steps for the interval tree and ordered set both take $\mathcal O(\log N)$ ~\mathcal O(\log N)~ time. Therefore this extra step does not change the total time complexity of our algorithm: it is still $\mathcal O(N \log N)$ ~\mathcal O(N \log N)~.

Comments

There are no comments at the moment.