Generalized Resubstitution for Regression Error Estimation
This work addresses error estimation for regression models, providing more reliable estimators that could benefit statistical modeling and machine learning applications, though it appears incremental as it builds on existing resubstitution methods.
The paper tackles the problem of error estimation in regression by proposing generalized resubstitution estimators, which offer improved bias and variance properties compared to standard methods, as demonstrated through consistency proofs and experimental results with polynomial regression.
We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.