Editorial for JOI '16 Open P2 - Selling RNA Strands


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Subtask 1 [10 points]

N, M, |Si|, |Pj|, |Qj| are small. We can solve this subtask by checking all RNA sequences for each order.

Subtask 2 [25 points]

Let's consider a simpler version of this problem:

There are N RNA sequences S1,,SN and M orders P1,,PM. For each j (1jM), compute the number of i such that the first |Pj| characters of Si are Pj.

We can store all of S1,,SN to a trie. Now we identify a string X and the node of the trie which corresponds to X. For any two strings X,Y, the following are equivalent:

  • X is a prefix of Y, that is, the first |X| characters of Y are X.
  • Node X is an ancestor of node Y.

Let C(S) be the number of occurrences of S in S1,,SN. Then the problem is computing X:predecessor of PjC(X) for each j.

This can be done by precalculation with DFS, but here we use a different approach. Using the Euler-Tour technique, checking whether node x is an ancestor of node y can be reduced to checking whether a point (corresponding to y) is included in a range (corresponding to x). Now we get an algorithm for this problem:

  • Store S1,,SN,P1,,PM to a trie.
  • Using the Euler-Tour technique, compute the range and the point for nodes in the trie. Note that we need not compute the values for nodes which don't correspond to any of S1,,SN,P1,,PM.
  • For each j, count the number of i such that the point corresponding to node Si is included in the range corresponding to node Pj.

Let's go back to the original problem. Now we have the constraint by Qj as well as by Pj. We saw that handling Pj can be done by a trie and the Euler-Tour technique. It can be easily checked that the same approach can be used for handling Qj if we consider S1R,,SNR,Q1R,,QMR (where SR is the reverse string of S).

Finally, we get an algorithm as follows:

  • Store S1,,SN,P1,,PM to a trie.
  • Using the Euler-Tour technique, compute the range and the point for nodes in the trie.
  • Do the same for S1R,,SNR,Q1R,,QMR.
  • For each j, compute the answer by checking two ranges.

Subtask 3 [25 points]

Let ΣS, ΣP, ΣQ be |S1|++|SN|, |P1|++|PM|, |Q1|++|QM|, respectively.

We saw that this problem becomes relatively easy if we can ignore P or Q. Now consider the following algorithm:

  • For each j, extract only RNA sequences whose last |Qj| characters are Qj. Then compute the number of RNA sequences in the extracted ones whose first |Pj| characters are Pj.

The later part of the algorithm can be done by using a trie. The former part also can be done with a trie. This algorithm is not so efficient because it may be possible that all of the RNA sequences are extracted and thus stored to a trie for each j. In this case, it works in O(MΣS). But we can add a (seemingly small) optimization:

  • Extract RNA sequences only once for the same Q.

In fact, it significantly reduces the complexity to O(ΣSΣQ+ΣP+ΣQ).

Here is a proof. For each i, Si is extracted only if Qj is a suffix of Si (that is, the last |Qj| characters of Si are Qj). Let L be the set of such Qj's. Each string of L has different length, so XL|X|1++#L=(#L+1)#L2. Also XL|X|ΣQ holds, so we have (#L+1)#L2ΣQ. Thus #L=O(ΣQ). It means that each Si is added to a trie O(ΣQ) times. Adding Si to a trie once can be done in O(|Si|). So adding strings to tries can be completed overall in O(ΣSΣQ). Now we have tries, so computing the answer can be done by accessing node Pj of a trie. This can be done in O(ΣP). Also, checking all of Q1,,QM itself takes O(ΣQ) time.

After all, it was proved that this algorithm works in O(ΣSΣQ+ΣP+ΣQ).

Subtask 4 [40 points]

The approach to the solution for Subtask 2 is reducing the problem to the following:

There are N points Xi,Yi and M rectangles [Aj,Bj]×[Cj,Dj] (whose axes are parallel to the X or Y axis) on a 2D plane. For each rectangle, compute the number of points which are included in it.

If we ignore irrelevant nodes of tries, the coordinates of the points and the rectangles will be integers between 0 and N+M.

Let K(A,B,C,D) be the number of i such that AXiB and CYiD. As the coordinates of the points are not less than 0, we have K(A,B,C,D)=K(A,B,1,D)K(A,B,1,C1). So we just need to compute K(Ak,Bk,1,Dk) for some queries Ak,Bk,Dk efficiently. This can be done by the following algorithm:

  • Sort the points and the queries by Yi (for points) or Dk (for queries). If a point and a query have the same value, make the point appear earlier in the sequence.
  • Initialize an array V0,,VN+M with 0.
  • Scan the sorted sequence from the beginning and do the following:
    • For a point (X,Y), set VXVX+1.
    • For a query (A,B,1,D), answer VA+VA+1++VB.

In this algorithm, we need to do array operations efficiently. This can be done with a Binary Indexed Tree.


Comments

There are no comments at the moment.