Editorial for DMOPC '20 Contest 2 P5 - Majority Subarrays
Submitting an official solution before solving the problem yourself is a bannable offence.
Author:
Subtask 1
For subtask 1, we can loop over all  subarrays and check whether it has a majority element by looping over all elements and counting the number of times it occurs, which is 
 for each subarray.
Time complexity: 
Subtask 2
For subtask 2, we can speed up counting the number of times an element occurs by precomputing a 2 dimensional table of how many times each number occurs in each prefix of the array, allowing us to query the number of occurrences of a number in some subarray in .
Time complexity: 
Subtask 3
For subtask 3, there are several ways to further improve the complexity. We describe two of them below.
One approach is to use randomization to check whether a subarray has a majority element. Note that if we randomly choose some element of the subarray, there is a  chance or better that it is a majority element, if one exists. Thus after precomputing the occurrence table we can check some number (say 
) random elements in each subarray in order to determine whether it definitely has a majority element or very likely does not. If we pick 
, the probability that no subarray is incorrectly determined to not have a majority element is at least 
 for 
, which is good enough.
Another approach, which will help us in further subtasks, is to note that each subarray has at most one majority element, so it suffices to count for each number between  and 
 the number of subarrays where that number occurs more than half the time. For each number 
, consider the array 
 where 
 if 
 and 
 if 
. We need to find the number of subarrays of 
 with a positive sum, which can be done by taking the prefix sum array of 
 and using a binary indexed tree while looping over the array to help count the number of previously seen elements that are less than the current element.
Time complexity: 
Subtask 4
For subtask 4, there are several ways to further improve the complexity. We describe a few of them below.
One easy method is to consider all left endpoints and loop through all right endpoints while maintaining a running frequency table and a maximum frequency value. We can then compare this with the length of the current subarray to determine whether it has a majority element.
Another approach is to modify the standard linear-time majority checking algorithm to get a unique candidate for the majority element for each subarray, and use the precomputed occurrence table to check whether that candidate is indeed a majority element. Note that the occurrence table needs to store  integers which takes about 
 megabytes of memory and fits in the memory limit.
Another way is to optimize the second approach in subtask 3. Note that consecutive updated/queried values differ by at most 1. It suffices to maintain a simple array of the number of times each prefix sum is seen so far, as well as the current prefix sum and the answer to the last query. If s is the prefix sum and p is the last answer, then we can do p += seen[s],s++ if the current  value is 
 and do 
s--,p -= seen[s] if the current  value is 
, add 
p to a running answer, and add 1 to seen[s].
Time complexity: 
Subtask 5
For subtask 5, we need to do better than the quadratic time barrier from either looping through all subarrays or looping through the entire array for each number.
One approach is to optimize counting majority subarrays for numbers that don't occur often. If a number occurs  times in the array, there are 
 possibilities for which of these numbers are part of the subarray, and we can do some math to calculate for each possibility the number of subarrays with that number as a majority element. Combined with the 
 approach for subtask 4, we can consider some number that occurs 
 times in 
 time, which is 
 overall with a good constant. Other square root decomposition ideas are possible.
Another approach is to note that there are many long runs of s in the 
 array introduced above. We can take advantage of this to simulate the subtask 4 algorithm quickly using a lazy segment tree that supports adding an arithmetic sequence to a range and range sum queries in 
 time for a number that occurs 
 times. This gives an 
 algorithm overall that unfortunately has a very bad constant but still passes subtask 5 easily.
Time complexity:  or 
Subtask 6
For subtask 6, we need an  algorithm.
Let's try to process some number  occurring 
 times in 
. We need a better way to take advantage of the long runs of 
s in the 
 array (recall that 
 if 
 and 
 if 
). The key is that the long runs of 
s separate the array into small sections that cannot interfere with each other, as "crossing" a long run of 
s will result in the subarray becoming "too negative" for that number to occur enough times. Here is one way to formalize this: Consider a number line and with a pen at 
 initially. Loop through the 
 array and move the pen to the right by 
 each time we see a 
 and move the pen to the left by 
 each time we see a 
, drawing a "trail" of arrows of length 1 as we go. Every left arrow that does not overlap or touch a right arrow can be effectively ignored as they do not change the answer. All other left arrows can be paired with the closest right arrow by drawing time that covers or touches it (possibly drawn before or after this left arrow is drawn). On the other hand, there are only 
 right arrows, each of which can only "activate" 4 left arrows each. Thus there are at most around 
 important arrows we need to consider, and all other arrows can be effectively ignored. With a proper implementation, this gives the desired 
 algorithm overall.
Time complexity: 
Comments
For anyone reading in the future, it may be worth clarifying that
, not 
.
I also want to mention an
 solution with better constant that with some optimizations, passes under the time limit (my solution has a worst case of 0.895s on the official data).  It uses an observation similar to the one given in the editorial (quantifying the idea of being "too negative"): For some number 
, let 
.  Only indices 
 where 
 (assume 
) can contribute to the answer (let's call these indices important), and we can prove that the number of indices that can contribute to the answer is 
 for a number that occurs 
 times (look at what happens to the values of 
 and 
 when you change a single 
 from 
 to 
).  We can then count the contribution of each important index using a data structure supporting range increment and prefix sum query, which can be done using BIT.
This solution is can probably be optimized to
, but I haven't been able to work out the specifics.