CCC '20 S3 - Searching for Strings

View as PDF

Submit solution


Points: 10 (partial)
Time limit: 0.5s
Memory limit: 512M

Problem types
Canadian Computing Competition: 2020 Stage 1, Senior #3

You're given a string N, called the needle, and a string H, called the haystack, both of which contain only lowercase letters a..z.

Write a program to count the number of distinct permutations of N which appear as a substring of H at least once. Note that N can have anywhere between 1 and |N|! distinct permutations in total – for example, the string aab has 3 distinct permutations (aab, aba, and baa).

Input Specification

The first line contains N (1 \le |N| \le 200\,000), the needle string.

The second line contains H (1 \le |H| \le 200\,000), the haystack string.

For 3 of the 15 available marks,|N| \le 8 and |H| \le 200.

For an additional 2 of the 15 available marks,|N| \le 200 and |H| \le 200.

For an additional 2 of the 15 available marks,|N| \le 2\,000 and |H| \le 2\,000.

Output Specification

Output consists of one integer, the number of distinct permutations of N which appear as a substring of H.

Sample Input

aab
abacabaa

Output for Sample Input

2

Explanation of Output for Sample Input

The permutations aba and baa each appear as substrings of H (the former appears twice), while the permutation aab does not appear.


Comments


  • 0
    ross_cleary  commented on March 24, 2020, 7:32 p.m.

    I had a pretty much identical solution on the CCC and it also resulted in a TLE. I think the reason may be that updating the string for every shift of the needle takes O(N) time, although I am not entirely sure though.


  • 3
    Kevy3030  commented on March 24, 2020, 5:00 p.m. edit 6

    Hey there! My code seems to almost get to the end of the last batch, running under 0.02sec for each of those cases, and then suddenly TLEs just a few cases before the end. Is this intended, or is there something I should know about?

    I've spent countless frustrated hours on this already, so any help will be tremendously appreciated.

    More Info: I've read the editorial and looked into Rabin-Karp. However, as I was unaware about this algorithm on the day of the contest, I was just wondering why what I did resulted in almost instant execution up to Test Case 113 and suddenly TLE on Test Case 114. Instead of using hashes I used a vector<int> to store the number of occurances of each letter within the test string and compared that to the needle string. Once a permutation was found the actual string was checked against a map<string, bool> to see if it has already been counted. To shift the test string I basically subtracted 1 from the count of the leftmost letter and added 1 to the count of the rightmost letter.

    I know my algorithm isn't optimal, I just wanted to know if the contest was designed intentionally to run in 0.1sec on Test 113 and TLE people on Test 114, or if there is a size limitation by my algorithm (i.e. exceeding 32-bit integer?) that would cause this?

    EDIT: On the Official CCC grader I'm getting RTE signal 6 on that same test case.


    • 4
      sushi  commented on March 24, 2020, 11:08 p.m.

      Hey there!

      So a couple of things. I strongly advise you to join the dmoj slack so we can exchange messages instead of cluttering up the comments section. In addition, I do not know which submission you are referring to, but looking at the most recent one: https://dmoj.ca/submission/1984929, your current code runs in about O(N^2) time, judging from this for loop:

      for(int i = 0; i < haystack.length() - nL + 1; i++){
          if(currentPermutation == needlePermutation){
             string test;
             for(int k = i; k < i+nL; k++){
                 test.push_back(haystack[k]);
             }
          }
      }
      

      And the result is that this is too slow. please visit the #help section in the dmoj slack so we can help you there. Thank you.


      • 0
        Kevy3030  commented on March 25, 2020, 1:19 p.m. edit 3

        Thank you so much! EDIT: Did I seriously use a forloop instead of substr to generate a substring during the contest? sigh.

        EDIT 2: Well i guess the limit to my algorithm reveals itself now, as now I MLE instead of TLE :/


  • -20
    31501357  commented on March 19, 2020, 11:11 p.m.

    This comment is hidden due to too much negative feedback. Click here to view it.


    • 34
      DiscoverMeme  commented on March 23, 2020, 5:17 p.m. edited

      CCC has become terrible in recent years, thanks to Dr. Troy Vasiga and his team. Lots of server issues and crashes on the main CCC site since at least 2016. Some of them were blamed on the public. This doesn't happen in other programming contests but they got away with it because they didn't provide proper data to back their claims. It is easier to blame your server crashes on the users, your team, or the wind blowing than learn a bit about load balancing to prevent overloading your servers.

      This must have emboldened them to make the 2019 and 2020 Senior contests a disaster. Take for example Problem S3 in 2020 - Searching for Strings. It has efficient solutions that involve string algorithms. However, these algorithms are not in the Canadian high school curricula, let alone computer studies.

      In fact, string algorithms are SPECIFICALLY EXCLUDED from the IOI syllabus and have been so for many years. See here, on page 13: https://people.ksp.sk/~misof/ioi-syllabus/ioi-syllabus.pdf

      String algorithms and data structures: there is evidence that the inter-reducibility between KMP, Rabin-Karp hash-ing, suffix arrays/tree, suffix automaton, and Aho-Corasick makes them diffcult to separate.

      Maybe there are other ways to solve it but they'd be extremely contorted. String algorithms provide a decisive advantage. Unsuspecting kids who attempt to circumvent or reinvent string algorithms will burn out their entire contest time and more. As page 4 of the IOI syllabus makes clear:

      The tasks will be set with the goal that knowledge of Excluded topics should not help in obtaining simpler solutions / solutions worth more points.

      The other problems in 2020 CCC Senior are equally out of whack. Either a strikingly similiar problem to one from 2019 Reprechage, bad data full of whitespace that leaves you wondering why your program crashes, waiting minutes for S3 solutions to judge, or endless constant optimization in S2. No fun graphs or trees which make up the majority of the IOI syllabus.

      Since the organizers clearly didn't design this contest for a good IOI selection, nor to encourage the participants, they look like a bunch of one-trick-ponies who have ruined the contest on purpose. They get to look like geniuses to the corporate sponsors. You know, they get to show they are much smarter than the kids who couldn't invent on the spot all the string algorithms that some of the organizers learned in university.

      Considering that CCC participants will soon compete with the CCC organizers for jobs and grants, botching the contest in 2016-2020 was a clever strategy for the organizers. They also got to make their favorites win by knocking down everyone else with unfair subject matter and server issues. Or maybe not?

      We'll see if the team is willing to fix the 2020 CCC botch. They need to get someone more responsible to redo the contest and properly select the IOI team. This contest used to run smoothly before Troy Vasiga took over and wasted it year after year after year. CCC is a tremendous asset for Canadian education, it is important for thousands and thousands of students each year, and it deserves honest and competent leadership.

      https://dmoj.ca/post/158-ccc-19-problems-posted#comment-9190


      • 17
        2fecta  commented on March 23, 2020, 9:54 p.m. edit 6

        https://dmoj.ca/problem/ccc20s5#comment-11869

        The CCC this year was far to computer science-y. Olympiad students like xiaowuc2 and wieung_bvg really need to stop setting problems. It's all just people who learned the same graph theory and algorithms and put it on the CCC.

        The failed CCC of 2020 is a clear demonstration of how during the months prior, as students across Canada were working hard in preparation for the competition, the contest organizers definitely were not. Despite the fast grader, which allowed sub-optimal solutions to achieve optimal scores, the website crashed several times and participants were often forced to wait in long queues in order to receive feedback for their submission. Supporters of this queue may attempt to justify the wait as an inevitable consequence of the scale of the competition; however, to this I point out the more popular USA Computing Olympiad. Not only does the USACO frequently serve over 6000 users, but it is also hosted over a 4 day period, allowing participants to view and discuss the problems and improve soon after they compete. Even after the frustratingly long CCC window of over 30 days was over, CCC organizers were slow to publicize the problems. Unlike the February 2020 USACO, which was hosted at a later date, CCC 2020 results are still unavailable to this day, and I could not locate an editorial or test data anywhere on the official website. I also couldn’t locate a single word of gratitude towards Mike "MikeMirzayanov" Mirzayanov for his great Polygon and Codeforces systems.

        Incompetence is also shown in the CCC problems, whether it be in the unsorted input of S1, unclear constraints of S2, mundane debugging of S3, or mathematical thinking (!) of problems S4 and S5. It is easy to see the bias in these problem topics due to the problemsetters' shared knowledge perspective at olympiads.

        After moving to Canada, an IOI gold medalist graduated early without participating in this excuse of a contest. But you don’t need to be 300iq to know that the CCC isn’t the “fun challenge for secondary school students” it claims to be. As a result, the Canadian Computer Community has devolved into a more or less homogeneous population of CS nerds who don't have a gf, significantly impacting economic growth. Moreover, this group typically places their individual needs of "[getting] into cco because iT lOoKs GoOd oN [mY] rEsUmE" over the glorious needs of the party. If CCC organizer "Troy Vasiga" has any respect for the advancement of the human race, he will rename the CCC to the Союз Советских Социалистов and purge the all setters who keep putting algorithms in their problems.


        Edit: On a more serious note, it does seem like insufficient preparation has gone into the recent years of CCC. The test data has been, on many occasions, too weak or simply wrong. The constraints should be set so that solutions with incorrect complexities do not pass, while solutions with correct complexities do not require too much constant optimization. The grader and website could be better made to not crash and show the verdict to a submission in a timely manner, like those of other large competitions such as those on Codeforces or Google. For these reasons the score and ranking of many participants deviate from their usual performance on other contests such as DMOJ contests or USACO and are often a poor representation of their programming ability. This is a concern considering CCC scores are used to select the Canadian IOI team. It is also unfair towards students who have spent a lot of time practicing competitive programming and wish to apply for the University of Waterloo.


    • 20
      magicalsoup  commented on March 20, 2020, 11:58 a.m.

      Not really.


      • -9
        Plasmatic  commented on March 20, 2020, 4:58 p.m.

        This comment is hidden due to too much negative feedback. Click here to view it.


        • -11
          31501357  commented on March 20, 2020, 10:48 p.m.

          This comment is hidden due to too much negative feedback. Click here to view it.


          • 18
            aeternalis1  commented on March 20, 2020, 10:53 p.m.

            It would be pretty ridiculous if they just tested the same things year after year. Each algorithm already seen had its "first time" in the CCC at some point, and it just so happens this was the year for rolling hash.


            • -8
              31501357  commented on March 21, 2020, 10:59 a.m.

              This comment is hidden due to too much negative feedback. Click here to view it.


        • -15
          Dan13llljws  commented on March 20, 2020, 8:44 p.m.

          This comment is hidden due to too much negative feedback. Click here to view it.


          • -9
            Plasmatic  commented on March 20, 2020, 11:57 p.m.

            This comment is hidden due to too much negative feedback. Click here to view it.


  • -1
    mathgeek008  commented on March 19, 2020, 9:54 p.m.

    Every test case for my code finishes a bit below 0.1 seconds up until test case #2 of batch #4, where it immediately TLE's. Where could the problem be with my code? Does that test case normally take over 6 times as long as the others?


    • 1
      c  commented on March 19, 2020, 11:02 p.m.

      Take careful consideration of your time complexity and the constraints of the problem.

      If you need further help, feel free to join the slack and ask in #help


    • 0
      Togohogo1  commented on March 19, 2020, 10:16 p.m. edited

      I think that would be the case.


  • 2
    Winbigwok  commented on March 17, 2020, 4:34 p.m.

    please change the sample input to be separated onto 2 lines


    • 3
      wleung_bvg  commented on March 17, 2020, 5:00 p.m.

      The issue has been fixed. I apologize for the confusion.


  • 0
    devnarula  commented on March 15, 2020, 8:14 p.m.

    can someone tell what's wrong with my code? Are my hashing functions wrong?


    • 3
      Dingledooper  commented on March 15, 2020, 8:24 p.m.

      The hasher() function is extermely weak, the best way to check if two strings are permutations of each other is to use a frequency array.


  • 10
    ross_cleary  commented on March 13, 2020, 8:39 p.m.

    Are the other CCC problems going to be posted?


    • 8
      c  commented on March 14, 2020, 1:24 a.m.

      We are waiting for test data for those problems.