MLAug 18, 2024
Deep Limit Model-free Prediction in RegressionKejin Wu, Dimitris N. Politis
In this paper, we provide a novel Model-free approach based on Deep Neural Network (DNN) to accomplish point prediction and prediction interval under a general regression setting. Usually, people rely on parametric or non-parametric models to bridge dependent and independent variables (Y and X). However, this classical method relies heavily on the correct model specification. Even for the non-parametric approach, some additive form is often assumed. A newly proposed Model-free prediction principle sheds light on a prediction procedure without any model assumption. Previous work regarding this principle has shown better performance than other standard alternatives. Recently, DNN, one of the machine learning methods, has received increasing attention due to its great performance in practice. Guided by the Model-free prediction idea, we attempt to apply a fully connected forward DNN to map X and some appropriate reference random variable Z to Y. The targeted DNN is trained by minimizing a specially designed loss function so that the randomness of Y conditional on X is outsourced to Z through the trained DNN. Our method is more stable and accurate compared to other DNN-based counterparts, especially for optimal point predictions. With a specific prediction procedure, our prediction interval can capture the estimation variability so that it can render a better coverage rate for finite sample cases. The superior performance of our method is verified by simulation and empirical studies.
MLMay 14, 2024
Scalable Subsampling Inference for Deep Neural NetworksKejin Wu, Dimitris N. Politis
Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals.
STSep 17, 2020
Ridge Regression Revisited: Debiasing, Thresholding and BootstrapYunyi Zhang, Dimitris N. Politis
The success of the Lasso in the era of high-dimensional data can be attributed to its conducting an implicit model selection, i.e., zeroing out regression coefficients that are not significant. By contrast, classical ridge regression can not reveal a potential sparsity of parameters, and may also introduce a large bias under the high-dimensional setting. Nevertheless, recent work on the Lasso involves debiasing and thresholding, the latter in order to further enhance the model selection. As a consequence, ridge regression may be worth another look since -- after debiasing and thresholding -- it may offer some advantages over the Lasso, e.g., it can be easily computed using a closed-form expression. % and it has similar performance to threshold Lasso. In this paper, we define a debiased and thresholded ridge regression method, and prove a consistency result and a Gaussian approximation theorem. We further introduce a wild bootstrap algorithm to construct confidence regions and perform hypothesis testing for a linear combination of parameters. In addition to estimation, we consider the problem of prediction, and present a novel, hybrid bootstrap algorithm tailored for prediction intervals. Extensive numerical simulations further show that the debiased and thresholded ridge regression has favorable finite sample performance and may be preferable in some settings.