Editorial for Baltic OI '18 P5 - Genetics
Submitting an official solution before solving the problem yourself is a bannable offence.
This problem had a simple cubic solution, lots of potential for optimization, and a nice probabilistic quadratic solution. We'll describe the latter.
Let's start by solving the problem for the binary alphabet case. In this case, we can make the math slightly nicer: instead of having the letters A
and C
, we use the numbers
Now, what we want to check is that the dot products are
In less abstract mathematical terms, and generalizing to larger alphabets, we pick random
If the row isn't the answer, it highly likely does not equal that. Changing any
Interesting notes:
Test data generation for this task was pretty tricky. For the naive solution not to pass, we want almost all distances between rows to be
, but all but one row should also have some other row with distance not . The only matrices that the authors are aware of where distances are all equal are identity matrices (with , ), constant matrices ( ), and generalized Hadamard matrices ( , ), and combinations of the three. Here denotes the alphabet size. For a binary alphabet, Hadamard matrices are defined to be matrices of with such that all rows differ in exactly positions. They are well studied, and a simple construction of them is a recursive one: start with the matrix:and repeatedly replace
by:where
means with all entries of XORed by . This results in a Hadamard matrix of any size which is a power of .For alphabets of size
we can do something more complicated, with instead replacing by:Proof that this works is left as an exercise for the reader (the construction is derived from the multiplication table for
, for the mathematically inclined).Given matrices with all pairwise distances
, we can perturb the matrix in various ways to make the answer unique, e.g. duplicating rows or changing bits of the matrix.These complex constructions partly explain the constraints section – it is difficult to construct Hadamard matrices for sizes that are not powers of
for alphabet size .- There is also a fun sub-cubic solution: with the formulation that values are in
, we can think of the problem simply as asking for a matrix product , from where we can check which values are . Matrix multiplication can in theory be computed in time, although in practice the algorithms that do this are very non-trivial to implement and have too high constant factors.
Comments