Nonlinear Laplacians: Tunable principal component analysis under directional prior information
This work addresses the problem of signal detection in noisy data with prior constraints for researchers in statistics and machine learning, offering an incremental improvement over existing spectral methods.
The paper tackles the problem of detecting and estimating rank-one signals from noisy observations under prior directional information, such as positive bias, by introducing nonlinear Laplacian algorithms that deform the observation matrix with a nonlinear function. The result shows that these algorithms, when appropriately tuned, substantially outperform direct spectral methods in sparse principal component analysis models, with rigorous theoretical characterization of signal strength requirements.
We introduce a new family of algorithms for detecting and estimating a rank-one signal from a noisy observation under prior information about that signal's direction, focusing on examples where the signal is known to have entries biased to be positive. Given a matrix observation $\mathbf{Y}$, our algorithms construct a nonlinear Laplacian, another matrix of the form $\mathbf{Y}+\mathrm{diag}(σ(\mathbf{Y1}))$ for a nonlinear $σ:\mathbb{R}\to\mathbb{R}$, and examine the top eigenvalue and eigenvector of this matrix. When $\mathbf{Y}$ is the (suitably normalized) adjacency matrix of a graph, our approach gives a class of algorithms that search for unusually dense subgraphs by computing a spectrum of the graph "deformed" by the degree profile $\mathbf{Y1}$. We study the performance of such algorithms compared to direct spectral algorithms (the case $σ=0$) on models of sparse principal component analysis with biased signals, including the Gaussian planted submatrix problem. For such models, we rigorously characterize the strength of rank-one signal, as a function of $σ$, required for an outlier eigenvalue to appear in the spectrum of a nonlinear Laplacian matrix. While identifying the $σ$ that minimizes the required signal strength in closed form seems intractable, we explore three approaches to design $σ$ numerically: exhaustively searching over simple classes of $σ$, learning $σ$ from datasets of problem instances, and tuning $σ$ using black-box optimization of the critical signal strength. We find both theoretically and empirically that, if $σ$ is chosen appropriately, then nonlinear Laplacian spectral algorithms substantially outperform direct spectral algorithms, while retaining the conceptual simplicity of spectral methods compared to broader classes of computations like approximate message passing or general first order methods.