On the Size of the Online Kernel Sparsification Dictionary
This work addresses computational efficiency in kernel methods for machine learning practitioners, though it appears incremental as it builds on existing sparsification techniques.
The paper tackles the problem of controlling dictionary size in online kernel sparsification by deriving a formula linking it to the eigenvalues of the covariance operator, showing that the size grows sub-linearly with data points and ensuring consistency of the kernel linear regressor.
We analyze the size of the dictionary constructed from online kernel sparsification, using a novel formula that expresses the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Using this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we show that under certain technical conditions, the size of the dictionary will always grow sub-linearly in the number of data points, and, as a consequence, the kernel linear regressor constructed from the resulting dictionary is consistent.