Editorial for CEOI '16 P4 - Match


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Let's recall the traditional algorithm for checking whether a bracket string is a valid bracket sequence. The algorithm iterates over all characters sequentially and makes use of a stack. If the next character to process is an opening bracket, we push it to the top of the stack. If it is a closing bracket, we check the state of the stack. If it is empty, the sequence is not valid. Otherwise, we erase its top item. If at the end of the process the stack is non-empty, again, the sequence is not valid. Note that this also gives us a way to identify "matching" brackets (in the sense defined in the problem statement). Every opening bracket is matched with the closing bracket that "popped" it off the top of the stack.

Solution 1 – 10 points

It's reasonable to assume that this stack-based approach is valuable to our current problem. We act much in the same manner, processing the characters of string S from left to right: if the stack is empty, the current character is pushed to the top of the stack, acting as an opening bracket. The same happens if the stack is not empty but its topmost character differs from our current one. However, if the stack's topmost character is the same as our current one, then we can also opt to convert the current one into a closing bracket and "pop" the other one off the stack. It's not clear how a particular decision affects future options or the existence of a solution, so the safest way to take care of this issue is to backtrack through all possible sequences of decisions and retain the best valid bracket sequence, or detect that none exists. Because each character leaves us with at most 2 options, the time complexity of this algorithm is bounded by \mathcal O(2^N). It should score 10 points.

Solution 2 – 37 points

Going further, let's find a faster algorithm, but for a simpler problem. How fast can we decide if any matching valid bracket sequence exists for the given string? Intuitively, it seems that the only type of "bad" decision one can make is to open a bracket instead of closing an existing one. This is because the only way the algorithm can fail is by not having an empty stack at the end. It turns out that, indeed, closing brackets as soon as possible is a valid way of finding a solution if one exists.

To prove this, consider a scenario in which there exists a solution that was not found by this algorithm and show that it can be transformed into another solution that this algorithm will find. Therefore, we have a simple \mathcal O(N) algorithm that can check if a valid bracket sequence exists. Let's name this algorithm \text{DECIDE}. We can then apply a classical technique for obtaining lexicographically minimal solutions for a certain problem. For each decision we are confronted with, let's try to greedily convert the character into an opening bracket. To check if this might be a bad decision, run \text{DECIDE} on the current stack and the remaining string. If \text{DECIDE} tells us that it can no longer find any valid solution, we should instead choose to convert the character into a closing bracket. This gives us a simple \mathcal O(N^2) algorithm for our original problem. It should score 37 points.

Solution 3 – 100 points

Now let's look for a faster algorithm. Note that the first character will always be an open bracket. Let's assume we already magically know where its matching closing bracket will be in the optimal solution. Let's denote this value by \text{OptMatchBeginning}. Then, we can set the 0^\text{th} character of the solution to be (, the \text{OptMatchBeginning}^\text{th} character to be ) and then solve S[1 \dots \text{OptMatchBeginning}-1] and S[\text{OptMatchBeginning}+1 \dots N-1] recursively.

This would be handy indeed, so let's see what \text{OptMatchBeginning} is and how we can find it quickly. For a certain position, \text{Pos}, to be a valid candidate, it should hold that S[\text{Pos}] = S[0] and \text{DECIDE}(S[1 \dots \text{Pos}-1]) = \text{True} (why?). Acting again on intuition, we conjecture that among these valid positions, the right-most one is the optimal one. To prove this, suppose \text{PosLeft} and \text{PosRight} are two candidates, with \text{PosLeft} < \text{PosRight}. We know the following:

  1. \text{DECIDE}(S[1 \dots \text{PosLeft}-1]) = \text{True}
  2. \text{DECIDE}(S[1 \dots \text{PosRight}-1]) = \text{True}

We also introduce

Lemma 1: If \text{DECIDE}(S[x \dots z]) = \text{True} and \text{DECIDE}(S[x \dots y]) = \text{True}, with y < z, then it holds that \text{DECIDE}(S[y+1 \dots z]) = \text{True}.

Proof hint: \text{DECIDE} behaves identically for S[x \dots z] and S[x \dots y] up to and including position y. Also, because S[x \dots y] is solvable, it means that immediately after position y, the stack is empty. It will also be empty after position z.

Applying Lemma 1 on (1) and (2), we deduce that \text{DECIDE}(S[\text{PosLeft} \dots \text{PosRight}-1]) = \text{True}. We can infer that matching S[0] with S[\text{PosRight}] gives us a strictly better solution than matching it with S[\text{PosLeft}]. This is because the solution for S[1 \dots \text{PosLeft}-1] can remain exactly the same in both instances, but in the former case S[\text{PosLeft}] can become an opening bracket instead of a closing one.

We still have to make sure we can find \text{OptMatchBeginning} quickly in each recursive call. We'll pose the problem in such a way that it lends itself to an easy precomputation. We state the following:

Lemma 2: \text{OptMatchBeginning} is equal to the maximum position X for which \text{DECIDE}(S[X+1 \dots N-1]) = \text{True} and S[X] = S[0].

Proof hint: Using Lemma 1 (we actually apply it on reversed strings, but it's easy to see everything stays the same), we deduce that \text{DECIDE}(S[0 \dots X]) = \text{True}. The last thing we should prove is that \text{DECIDE}(S[1 \dots X-1]) = \text{True} (so S[0] can really be matched with S[X]). Imagine any solution for \text{DECIDE}(S[0 \dots X]). If S[0] and S[X] are not matched with each other, but with, say, S[Y] and S[Z] respectively, we can safely match S[Y] and S[Z] together and then match S[0] with S[X] keeping the rest of the solution exactly the same.

Now that Lemma 2 is proven true, we'll precompute the following table: \text{prevStart}[i][c] = the maximum position j, with j \le i such that \text{DECIDE}(S[j+1 \dots i]) = \text{True} and S[j] = c. This precomputation needs \mathcal O(N \Sigma) memory and \mathcal O(N \Sigma) time (its recurrence is left as an exercise). Then, \text{OptMatchBeginning} is simply equal to \text{prevStart}[N-1][S[0]]. It may not be apparent that doing this precomputation for the original string is enough to correctly determine \text{OptMatchBeginning} values for all recursive calls. Specifically, say we're solving S[\text{left} \dots \text{right}] recursively; is it always true that \text{prevStart}[\text{right}][S[\text{left}]] > \text{left}? This is indeed the case, but if you're unconvinced, try proving it formally. The reasoning is similar to our previous proof outlines.

The described algorithm has \mathcal O(N \Sigma) time complexity and should score 100 points. Purely linear solutions also exist, we invite you to find them!


Comments

There are no comments at the moment.