Mikhail Lepilov

2papers

2 Papers

0.8MLMay 25
Fast Spectrum Estimation of Some Kernel Matrices

Mikhail Lepilov

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It is desirable to know the eigenvalue decay properties of these matrices without explicitly forming them, such as when determining if a low-rank approximation is feasible. In this work, we introduce a new eigenvalue quantile estimation framework for some kernel matrices. This framework gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix. The kernel matrices under consideration come from a kernel with quick decay away from the diagonal applied to uniformly-distributed sets of points in Euclidean space of any dimension. We prove the efficacy of this framework given certain bounds on the kernel function, and we provide empirical evidence for its accuracy. In the process, we also prove a general interlacing-type theorem for finite sets of numbers. Additionally, we indicate an application of this framework to the study of the intrinsic dimension of data, as well as several other directions in which to generalize this work.

25.6NAMay 22
Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices

Mikhail Lepilov, Jianlin Xia

For some kernel matrices, low-rank approximations can be quickly obtained via analytic techniques. One important class of analytic methods that has received attention in recent years is based on the use of proxy points. Accuracy analysis for various proxy point methods has often been heuristic in nature, other than for certain special kernels. For more general cases, the methods lack an explicit number or location of proxy points required to yield a particular approximation accuracy. In this work, we carry out new analysis of a proxy point method that is applicable to general complex-analytic kernels. An intuitive way of choosing proxy points is used to show explicit error bounds. Such bounds decay exponentially with regard to the number of proxy points. This also leads to convenient estimates of numerical ranks of relevant kernel matrices. To showcase the utility of this new analysis, we apply it to design a new sublinear-time hierarchically semiseparable approximation method for certain Toeplitz matrices, including ones that frequently arise from real-world applications. This allows, for example, inversion of such matrices with lower computational complexity compared with existing direct methods. Some extensions of these ideas are also discussed.