Editorial for IOI '09 P8 - Salesman
Submitting an official solution before solving the problem yourself is a bannable offence.
We'll start by considering only the case where no two fairs occur on the same day. Later we'll show how to modify our algorithm to incorporate fairs that occur on the same day.
The first polynomial solution
First we'll describe a fairly standard dynamic programming algorithm. We order the fairs according to the day when they take place. For each fair we will compute the best profit we can achieve immediately after visiting this fair.
To avoid special cases, we'll add dummy fairs and which both take place at the salesman's home, fair being the first and fair the last of all fairs. We can immediately tell that and that is the answer we are supposed to compute.
The values to can all be computed in order, using the same observation: we have to arrive from some fair, and we may pick which one it is.
Let be the cost of travelling from point to point on the river. If , we have , otherwise we have .
We can then write:
(Explanation: To compute , we pick the number of the fair we visited immediately before fair . Immediately after fair the best profit we could have was . We then have to travel to the location of the current fair, which costs us , and finally we visit fair for a profit . To obtain the largest possible , we take the maximum over all possible choices of .)
The time complexity of this algorithm is , which is sufficient to solve the cases where all the input values are at most .
An improved solution
We will now improve the previous algorithm. Note that the profit from visiting fair is the same for all choices of . Thus, the optimal choice of depends on the profits , the locations , and the location of the current fair.
We can divide the fairs to into two groups: those upstream of , and those downstream. We can now divide our problem "find the optimal " into two subparts: "find the optimal choice for the previous fair upstream" and "find the optimal choice for the previous fair downstream".
Consider locating the optimal previous fair upstream of . If we were to change the value (in such a way that it does not change which other fairs are upstream of fair ), can it influence our choice? No, it can not. If we, for example, increase by , this means that for each of the upstream fairs, the cost of travelling to fair increases by the same amount: . Hence the optimal choice would remain the same.
We will now show a relatively simple data structure that will allow us to locate the optimal previous fair upstream of fair in time.
The data structure is commonly known as an interval tree. We can assign the fairs new labels according to their unique positions on the river. More precisely, let be the number of fairs that are upstream of fair (including those that occur after fair ).
Our interval tree is a complete binary tree with levels, where is the smallest integer such that . Note that .
Leaves in this binary tree correspond to the fairs, and the order in which fairs are assigned to leaves is given by the values . That is, the leftmost leaf is the fair closest to the river source, the second leaf is the second-closest fair, and so on.
Now note that each node in our tree corresponds to an interval of fairs — hence the name "interval tree". In each node of the interval tree, we will store the answer to the following question: "Let be the set of fairs that correspond to leaves in this subtree that were already processed. Supposing that I'm downstream from each of them, which one is the optimal choice?"
Given this information, we can easily determine the optimal choice for the next fair in . And it is also easy to update the information in the tree after fair was processed; this too can be done in .
In our solution we will, of course, have two interval trees: one for the direction upstream and one for the direction downstream. For each fair , we first make two queries to determine the best predecessor upstream and downstream, then we pick the better of those two choices, compute , and finally we update both interval trees.
Hence we process each fair in , leading to the total time complexity .
Another equally good solution
In this section, we will show another solution with the same complexity, which uses an "ordered set" data structure only, and can easily be implemented in C++ using the set
class.
As before, we will process the fairs one by one, ordered by the day on which they occur. Imagine a situation after we have processed some fairs. Let and be two fairs that we have already processed. We say that is covered by if .
In human words, is covered by if the strategy "visit fair last and then move to the location of fair " is at least as good as the strategy "visit fair last".
Once fair is covered by some other fair , this fair will never be an optimal predecessor for any later fair. Fair will always (regardless of the location of the later fair) be at least as good a choice as .
On the other hand, if a fair is currently not covered by any other fair, there are some locations on the river for which would be the optimal predecessor — at least the location and its immediate surroundings. We will call such fairs active.
In our solution, we will maintain the set of currently active fairs, ordered by their position on the river. We will use an "ordered set" data structure, most commonly implemented as a balanced binary tree.
It can easily be shown that for each active fair there is an interval of the river where is the optimal choice. These intervals are obviously disjoint (except possibly for their endpoints), and together they cover the entire river. And as the interval for contains , the intervals are in the same order as their corresponding active fairs.
Hence whenever we are going to process a new fair , we only have to locate the closest active fairs upstream and downstream of — one of these two must be the optimal choice.
After we process the fair and compute , we have to update the set of active fairs. Clearly, is now active, as we computed by taking the best way of getting to , and then added a positive profit . We add it into the set of active fairs. But we are not done yet — might now cover some of the previously active fairs. But these are easy to find: if neither of the immediate neighbours of (in the set of active fairs) is covered by , we are obviously done. If some of them are covered by , erase them from the set and repeat the check again.
In this solution, each fair is inserted into the set of active fairs once, and is erased from the set at most once. In addition, when processing each fair we make one query to find the closest two active fairs. Each of these operations takes , hence the total time complexity is .
Multiple fairs on the same day
First of all, note that we cannot process fairs that are on the same day one by one — because we must allow the salesman to visit them in a different order than the one we picked.
There may be many ways in which to visit the fairs on a given day. However, we don't need to consider all of them, just some subset that surely contains the optimal solution.
Suppose that we already picked some order in which to visit the fairs on a given day. Let and be the fairs furthest upstream and downstream we visit. We can then, obviously, visit all fairs between and as well, as we'll surely be travelling through their locations. And clearly to visit all of these fairs, it's enough to travel first to and then from to , or vice versa. We will only consider such paths.
We will process each day in two phases. In the first phase, we process each fair separately, as if it were the only fair that day, and we determine a preliminary value — the best profit we can have after coming to fair from some fair on a previous day.
In the second phase, we will take travelling upstream or downstream into account. We will consider each direction separately. When processing a direction, we'll process the fairs in order, and for each of them we'll determine whether it is more profitable to start at this fair (i.e., use the value computed in the previous step) or to start sooner (i.e., use the optimal value computed for the previous fair in this step, subtract the cost of travel from that fair to this one, and add the profit from this fair).
For each fair , the actual value is then equal to the larger of the two values we get for travelling upstream and downstream.
(DMOJ curator's note: I'm not sure which sections the following paragraph is referring to)
Finally, we need to update the set of active fairs. When using an interval tree data structure as in Section , this is accomplished simply by adding each fair to the upstream and downstream interval trees. When using an ordered set as in Section , one must take a little more care, as not all of the fairs that we have just processed will be active. This is easily catered for by modifying our update process — before inserting a new active fair, we check that the fair is actually active by examining its potential neighbours in the data structure. If either of the neighbouring fairs covers the one being added, then it is not active, and so should not be added to the active fairs set. With this modification, we can update the entire data structure by sequentially attempting to add each fair (in any order).
Clearly, the additional time needed to process the second phase on any day is linear in the number of fairs that day, assuming we already have them sorted according to their location (which is easily accomplished by adding this as a tiebreaker to the comparison function used to sort all fairs in the beginning). Furthermore, the update steps for the interval tree and ordered set both take time. Therefore this extra step does not change the total time complexity of our algorithm: it is still .
Comments