Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance
This resolves a tension in understanding when RFMs provide nonlinear gains, which is important for machine learning practitioners using simple neural network architectures.
The paper shows that random feature models (RFMs) exhibit a phase transition from linear to nonlinear behavior depending on input-label correlation under spiked-covariance data, with RFMs outperforming linear methods only above a specific correlation threshold.
Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.