ML LGMay 16, 2022

Optimal Randomized Approximations for Matrix based Renyi's Entropy

Yuxin Dong, Tieliang Gong, Shujian Yu, Chen Li

arXiv:2205.07426v19.912 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses a computational bottleneck for researchers and practitioners using matrix-based Renyi's entropy in large-scale statistical learning tasks, offering an incremental improvement through efficient approximations.

The paper tackles the high computational cost of exactly calculating matrix-based Renyi's entropy, which scales as O(n^3) with sample size, by developing randomized approximations that reduce the complexity to O(n^2sm) with s,m << n, achieving significant speedups with minimal accuracy loss in simulations and real-world applications.

The Matrix-based Renyi's entropy enables us to directly measure information quantities from given data without the costly probability density estimation of underlying distributions, thus has been widely adopted in numerous statistical learning and inference tasks. However, exactly calculating this new information quantity requires access to the eigenspectrum of a semi-positive definite (SPD) matrix $A$ which grows linearly with the number of samples $n$, resulting in a $O(n^3)$ time complexity that is prohibitive for large-scale applications. To address this issue, this paper takes advantage of stochastic trace approximations for matrix-based Renyi's entropy with arbitrary $α\in R^+$ orders, lowering the complexity by converting the entropy approximation to a matrix-vector multiplication problem. Specifically, we develop random approximations for integer order $α$ cases and polynomial series approximations (Taylor and Chebyshev) for non-integer $α$ cases, leading to a $O(n^2sm)$ overall time complexity, where $s,m \ll n$ denote the number of vector queries and the polynomial order respectively. We theoretically establish statistical guarantees for all approximation algorithms and give explicit order of s and m with respect to the approximation error $\varepsilon$, showing optimal convergence rate for both parameters up to a logarithmic factor. Large-scale simulations and real-world applications validate the effectiveness of the developed approximations, demonstrating remarkable speedup with negligible loss in accuracy.

View on arXiv PDF

Similar