Editorial for COCI '14 Contest 5 #6 Divljak
Submitting an official solution before solving the problem yourself is a bannable offence.
The solution sufficient for partial points was to check whether some of the barbarians' strings is a substring of the word Tarzan gives out and if there is, increment the counter of that word by , repeating this procedure each time Tarzan gives out another word. Regarding the solution for 
 of points, there are multiple solutions, both offline and online. Here we will describe an online solution.
To begin with, let's construct an Aho-Corasick tree, which is a prefix tree. Let's define  as the string obtained by traversing from the root of the tree to node 
 in the prefix tree.
For every node , the Aho-Corasick tree remembers the deepest node 
 such that 
 is a suffix of 
. Let's call such node 
.
Additionally, we need to remember for each node , node 
, where node 
 is the deepest node on which some of the barbarian's strings ends, and at the same time 
 is a suffix of 
. Let's call such node 
.
Now, let's construct an explicit tree from  links. Therefore, for each node on the Aho-Corasick tree, make a new node in the new tree so that the parent of the new node, node 
, is node 
 in the new tree.
Let's call the new node of some node  in the old tree 
. Having the tree constructed this way, by traveling to the root of the new tree from a node 
 we walk through all suffixes of string 
 that are at the same time words written on the tablets of the barbarians.
Now, let's see what operations need to be implemented.
During the update operation (in other words, when Tarzan gives a new word), it is necessary to update all strings that are in it as a substring. How do we find such strings? Let  be the newly given word. Let 
 be the prefix of word 
 up to position 
. We will use a classic traversal of the Aho-Corasick tree that, for each prefix of string 
, 
 finds the longest suffix of that prefix that exists in the Aho-Corasick tree. Formally, for each 
, where 
 is the position of the prefix in string 
, it finds the deepest node 
 such that 
 is the suffix of 
. Let's call such 
 for a 
, 
. If we walk towards the node in our newly generated tree from every node 
, we will walk through all words that are substring of the word 
 and are written on the tablets of the barbarians.
How to find  for each 
? Let's assume that we have 
 for an 
, and try to calculate 
 for 
. Let's call the letter appearing at position 
 in the string 
. In case there is an edge for letter 
 on node 
, then 
 is simply the node we get to when moving for edge 
 from node 
. In the case when such an edge doesn't exist, we try to find such an edge for 
. Therefore, we try to find such an edge for the next longest suffix that is the suffix of the current suffix 
. The second operation is repeated until we get back to the root of the tree or until we find an edge for letter 
.
The construction of this structure is possible in linear time per number of characters . Also, the construction of the new tree is possible in linear time.
Up to now, we currently need to be able to increase all nodes in the union of paths from the root of the new tree to the set of nodes , for each 
, by exactly 
.
Implementing this operation is possible by using heavy-light decomposition and logarithmic structure. Let's decompose the tree to a set of chains such that the number of chains changed on the path from the root of the tree to node  is of the size 
, where 
 is the number of characters in the tree. The remaining task is to increase a part of that chain by exactly 
. To not increase each string only once, but as many times as it appears in string 
, we would simply increase the corresponding part of each chain during our traversal from 
 to the root of the tree by 
. This increase can be very easily done by using a logarithmic structure for each chain.
How will we increase if we want to increase each node only once? We can notice that, on each chain, we will eventually increase only one of its prefixes by exactly  at the end of each update. Accordingly, we can, on each chain, remember up to what prefix had that chain been increased, and in the case that the current increase increases a few other nodes, we increase only that difference set between nodes. On a concrete example, let's say that so far in this update, for some chain the prefix of size 
 had been increased by 
. Now comes a new change on the chain that requires us to increase the prefix of size 
. In this case, we will not increase the prefix of size 
, but will only increase position 
. This is the way we handle the increase of each string by exactly 
. There are a few other ways in which we could handle this increase.
The total complexity of this approach is , where 
 is the number of total characters. The space complexity is 
.
Figuring out the offline solution of the same complexity is left as an exercise to the reader. (Hint: suffix array, union-find, BST.)
Comments