SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data
This work addresses the instability of traditional variable selection methods for high-dimensional, high-noise, and correlated data, which is a problem for researchers and practitioners working with such datasets.
This paper introduces SPPCSO, a penalized estimation method designed for high-dimensional correlated data, which integrates single-parametric principal component regression and L1 regularization. It aims to improve model stability and predictive accuracy by adaptively adjusting the shrinkage factor using principal component information, demonstrating stable and reliable estimation in high-noise settings and accurate signal variable distinction.
With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.