Optimal Activation Functions for the Random Features Regression Model
This work provides incremental theoretical insights into optimizing activation functions for Random Features Regression, potentially improving model performance and robustness in machine learning applications.
The authors identified the family of activation functions that minimize test error and sensitivity in Random Features Regression under functional parsimony constraints, finding optimal cases including linear, saturated linear, and Hermite polynomial forms. They demonstrated how these optimal functions affect key model properties like double descent and regularization dependency.
The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.