Flexible, Non-parametric Modeling Using Regularized Neural Networks
This work addresses the challenge of automatically selecting functional forms for additive components in non-parametric statistical modeling, which is a problem for statisticians and data scientists seeking flexible yet interpretable models.
This paper introduces PrAda-net, a one-hidden-layer neural network trained with proximal gradient descent and adaptive lasso, designed for non-parametric statistical modeling. PrAda-net automatically adjusts its architecture to data complexity, yielding a compact network translatable into interpretable additive model components. It was demonstrated on simulated data, showing competitive test error and variable identification, and applied to the U.K. black smoke dataset to model complex spatial and temporal data.
Non-parametric, additive models are able to capture complex data dependencies in a flexible, yet interpretable way. However, choosing the format of the additive components often requires non-trivial data exploration. Here, as an alternative, we propose PrAda-net, a one-hidden-layer neural network, trained with proximal gradient descent and adaptive lasso. PrAda-net automatically adjusts the size and architecture of the neural network to reflect the complexity and structure of the data. The compact network obtained by PrAda-net can be translated to additive model components, making it suitable for non-parametric statistical modelling with automatic model selection. We demonstrate PrAda-net on simulated data, where wecompare the test error performance, variable importance and variable subset identification properties of PrAda-net to other lasso-based regularization approaches for neural networks. We also apply PrAda-net to the massive U.K. black smoke data set, to demonstrate how PrAda-net can be used to model complex and heterogeneous data with spatial and temporal components. In contrast to classical, statistical non-parametric approaches, PrAda-net requires no preliminary modeling to select the functional forms of the additive components, yet still results in an interpretable model representation.