ME LG CO MLFeb 2, 2024

Conditional Mean and Variance Estimation via \textit{k}-NN Algorithm with Automated Variance Selection

Marcos Matabuena, Juan C. Vidal, Oscar Hernan Madrid Padilla, Jukka-Pekka Onnela

arXiv:2402.01635v23.36 citationsh-index: 53

Originality Incremental advance

AI Analysis

This work addresses the need for accurate conditional distribution estimation in statistical modeling, offering an incremental improvement over existing k-NN methods.

The paper tackles the problem of joint estimation of conditional mean and variance using a k-NN regression method, achieving improved empirical performance and fast convergence rates compared to conventional k-NN algorithms, as demonstrated through simulations and a biomedical application.

We introduce a novel \textit{k}-nearest neighbor (\textit{k}-NN) regression method for joint estimation of the conditional mean and variance. The proposed algorithm preserves the computational efficiency and manifold-learning capabilities of classical non-parametric \textit{k}-NN models, while integrating a data-driven variable selection step that improves empirical performance. By accurately estimating both conditional mean and variance regression functions, the method effectively reconstructs the conditional distribution and density functions for multiple families of scale-and-localization generative models. We show that our estimator can achieve fast convergence rates, and we derive practical rules for selecting the smoothing parameter~$k$ that enhance the precision of the algorithm in finite sample regimes. Extensive simulations for low, moderate and large-dimensional covariate spaces, together with a real-world biomedical application, demonstrate that the proposed method can consistently outperform the conventional \textit{k-NN} regression algorithm while being more interpretable in the model output.

View on arXiv PDF

Similar