Marco Bazzani

62.5ITMar 22

Probability of super-regular matrices and MDS codes over finite fields

Rathinakumar Appuswamy, Marco Bazzani, Spencer Congero et al.

Let $C$ be an $[n, k]$ linear code chosen uniformly at random over a finite field $\mathbb{F}_q$ of size $q$. The following asymptotic probability of $C$ being maximum distance separable (MDS) as $q,n,k\to\infty$ is known: If $\frac{1}{q}\binom{n}{k} \to 0$, then $P(C\text{ is MDS}) \to 1$. We demonstrate that this growth rate is in fact a threshold by proving: If $\frac{1}{q}\binom{n}{k} \to \infty$, then $P(C\text{ is MDS}) \to 0$. A matrix is (\textit{contiguous}) \textit{super-regular} if all of its (contiguous) square submatrices are nonsingular. The above results imply that for any $k \times k$ matrix $A$ chosen uniformly at random over $\mathbb{F}_q$, the following hold: If $\frac{4^k/\sqrt{k}}{q} \to 0$, then $P(A \text{ is super-regular}) \to 1$. If $\frac{4^k/\sqrt{k}}{q} \to \infty$, then $P(A \text{ is super-regular}) \to 0$. We also obtain the following asymptotic probabilities for two variations of the above questions: If $\frac{1}{q}\binom{n}{k} \to Î»\in (0,\infty)$ and $k/n \to 0$, then $P(C\text{ is MDS}) \to e^{-Î»}$. If $\frac{k^3/3}{q} \to Î»\in (0,\infty)$, then $P(A \text{ is contiguous super-regular}) \to e^{-Î»}$. The number of contiguous super-regular $3 \times 3$ matrices is also a polynomial. Finally, for $4 \times 4$ matrices, we show that the number of super-regular matrices is not a polynomial, nor even a quasi-polynomial of period less than 7, whereas our experimental evidence suggests that the number of contiguous super-regular matrices is a polynomial.

63.7DSMay 11

Performance bounds for nearest neighbor search with k-d trees

Marco Bazzani, Sanjoy Dasgupta

The $k$-d tree is one of the oldest and most widely used data structures for nearest neighbor search. It partitions Euclidean space into axis-aligned rectangular cells. There are two standard ways to find the nearest neighbor to a query in a $k$-d tree. Defeatist search returns the closest data point in the query's cell, while comprehensive search also searches other cells as needed to guarantee it finds the nearest neighbor. Both strategies are commonly believed to perform poorly in high dimensions, but there have been few theoretical results explaining this. We prove non-asymptotic bounds on the runtime of comprehensive search and the accuracy of defeatist search. Under mild distributional assumptions, when the dimension $d$ is at least polylogarithmic in the number of data points, defeatist search is no more likely to return the nearest neighbor than random guessing, and comprehensive search visits every cell with high probability. We also show that on uniform data, with high probability, comprehensive search visits at most $2^{\mathcal{O}(d)}$ cells when each cell contains at least logarithmically many data points, and defeatist search returns the nearest neighbor when each cell additionally contains at least $2^{\mathcal{O}(d \log d)}$ data points. Finally, for arbitrary absolutely continuous distributions, we upper bound the expected distance between the query and the point returned by defeatist search.

Marco Bazzani

2 Papers