Editorial for IOI '09 P8 - Salesman
Submitting an official solution before solving the problem yourself is a bannable offence.
We'll start by considering only the case where no two fairs occur on the same day. Later we'll show how to modify our algorithm to incorporate fairs that occur on the same day.
The first polynomial solution
First we'll describe a fairly standard dynamic programming algorithm. We order the fairs according to the day when they take place. For each fair  we will compute the best profit 
 we can achieve immediately after visiting this fair.
To avoid special cases, we'll add dummy fairs  and 
 which both take place at the salesman's home, fair 
 being the first and fair 
 the last of all fairs. We can immediately tell that 
 and that 
 is the answer we are supposed to compute.
The values  to 
 can all be computed in order, using the same observation: we have to arrive from some fair, and we may pick which one it is.
Let  be the cost of travelling from point 
 to point 
 on the river. If 
, we have 
, otherwise we have 
.
We can then write:
(Explanation: To compute , we pick the number 
 of the fair we visited immediately before fair 
. Immediately after fair 
 the best profit we could have was 
. We then have to travel to the location of the current fair, which costs us 
, and finally we visit fair 
 for a profit 
. To obtain the largest possible 
, we take the maximum over all possible choices of 
.)
The time complexity of this algorithm is , which is sufficient to solve the cases where all the input values are at most 
.
An improved solution
We will now improve the previous algorithm. Note that the profit  from visiting fair 
 is the same for all choices of 
. Thus, the optimal choice of 
 depends on the profits 
, the locations 
, and the location 
 of the current fair.
We can divide the fairs  to 
 into two groups: those upstream of 
, and those downstream. We can now divide our problem "find the optimal 
" into two subparts: "find the optimal choice for the previous fair upstream" and "find the optimal choice for the previous fair downstream".
Consider locating the optimal previous fair upstream of . If we were to change the value 
 (in such a way that it does not change which other fairs are upstream of fair 
), can it influence our choice? No, it can not. If we, for example, increase 
 by 
, this means that for each of the upstream fairs, the cost of travelling to fair 
 increases by the same amount: 
. Hence the optimal choice would remain the same.
We will now show a relatively simple data structure that will allow us to locate the optimal previous fair upstream of fair  in 
 time.
The data structure is commonly known as an interval tree. We can assign the fairs new labels according to their unique positions on the river. More precisely, let  be the number of fairs that are upstream of fair 
 (including those that occur after fair 
).
Our interval tree is a complete binary tree with  levels, where 
 is the smallest integer such that 
. Note that 
.
Leaves in this binary tree correspond to the fairs, and the order in which fairs are assigned to leaves is given by the values . That is, the leftmost leaf is the fair closest to the river source, the second leaf is the second-closest fair, and so on.
Now note that each node in our tree corresponds to an interval of fairs — hence the name "interval tree". In each node of the interval tree, we will store the answer to the following question: "Let  be the set of fairs that correspond to leaves in this subtree that were already processed. Supposing that I'm downstream from each of them, which one is the optimal choice?"
Given this information, we can easily determine the optimal choice for the next fair  in 
. And it is also easy to update the information in the tree after fair 
 was processed; this too can be done in 
.
In our solution we will, of course, have two interval trees: one for the direction upstream and one for the direction downstream. For each fair , we first make two queries to determine the best predecessor upstream and downstream, then we pick the better of those two choices, compute 
, and finally we update both interval trees.
Hence we process each fair in , leading to the total time complexity 
.
Another equally good solution
In this section, we will show another solution with the same complexity, which uses an "ordered set" data structure only, and can easily be implemented in C++ using the set class.
As before, we will process the fairs one by one, ordered by the day on which they occur. Imagine a situation after we have processed some fairs. Let  and 
 be two fairs that we have already processed. We say that 
 is covered by 
 if 
.
In human words,  is covered by 
 if the strategy "visit fair 
 last and then move to the location of fair 
" is at least as good as the strategy "visit fair 
 last".
Once fair  is covered by some other fair 
, this fair will never be an optimal predecessor for any later fair. Fair 
 will always (regardless of the location of the later fair) be at least as good a choice as 
.
On the other hand, if a fair is currently not covered by any other fair, there are some locations on the river for which  would be the optimal predecessor — at least the location 
 and its immediate surroundings. We will call such fairs active.
In our solution, we will maintain the set of currently active fairs, ordered by their position on the river. We will use an "ordered set" data structure, most commonly implemented as a balanced binary tree.
It can easily be shown that for each active fair  there is an interval of the river where 
 is the optimal choice. These intervals are obviously disjoint (except possibly for their endpoints), and together they cover the entire river. And as the interval for 
 contains 
, the intervals are in the same order as their corresponding active fairs.
Hence whenever we are going to process a new fair , we only have to locate the closest active fairs upstream and downstream of 
 — one of these two must be the optimal choice.
After we process the fair  and compute 
, we have to update the set of active fairs. Clearly, 
 is now active, as we computed 
 by taking the best way of getting to 
, and then added a positive profit 
. We add it into the set of active fairs. But we are not done yet — 
 might now cover some of the previously active fairs. But these are easy to find: if neither of the immediate neighbours of 
 (in the set of active fairs) is covered by 
, we are obviously done. If some of them are covered by 
, erase them from the set and repeat the check again.
In this solution, each fair is inserted into the set of active fairs once, and is erased from the set at most once. In addition, when processing each fair we make one query to find the closest two active fairs. Each of these operations takes , hence the total time complexity is 
.
Multiple fairs on the same day
First of all, note that we cannot process fairs that are on the same day one by one — because we must allow the salesman to visit them in a different order than the one we picked.
There may be many ways in which to visit the fairs on a given day. However, we don't need to consider all of them, just some subset that surely contains the optimal solution.
Suppose that we already picked some order in which to visit the fairs on a given day. Let  and 
 be the fairs furthest upstream and downstream we visit. We can then, obviously, visit all fairs between 
 and 
 as well, as we'll surely be travelling through their locations. And clearly to visit all of these fairs, it's enough to travel first to 
 and then from 
 to 
, or vice versa. We will only consider such paths.
We will process each day in two phases. In the first phase, we process each fair  separately, as if it were the only fair that day, and we determine a preliminary value 
 — the best profit we can have after coming to fair 
 from some fair on a previous day.
In the second phase, we will take travelling upstream or downstream into account. We will consider each direction separately. When processing a direction, we'll process the fairs in order, and for each of them we'll determine whether it is more profitable to start at this fair (i.e., use the value computed in the previous step) or to start sooner (i.e., use the optimal value computed for the previous fair in this step, subtract the cost of travel from that fair to this one, and add the profit from this fair).
For each fair , the actual value 
 is then equal to the larger of the two values we get for travelling upstream and downstream.
(DMOJ curator's note: I'm not sure which sections the following paragraph is referring to)
Finally, we need to update the set of active fairs. When using an interval tree data structure as in Section , this is accomplished simply by adding each fair to the upstream and downstream interval trees. When using an ordered set as in Section , one must take a little more care, as not all of the fairs that we have just processed will be active. This is easily catered for by modifying our update process — before inserting a new active fair, we check that the fair is actually active by examining its potential neighbours in the data structure. If either of the neighbouring fairs covers the one being added, then it is not active, and so should not be added to the active fairs set. With this modification, we can update the entire data structure by sequentially attempting to add each fair (in any order).
Clearly, the additional time needed to process the second phase on any day is linear in the number of fairs that day, assuming we already have them sorted according to their location (which is easily accomplished by adding this as a tiebreaker to the comparison function used to sort all fairs in the beginning). Furthermore, the update steps for the interval tree and ordered set both take  time. Therefore this extra step does not change the total time complexity of our algorithm: it is still 
.
Comments