ML AI LG MEDec 28, 2021

Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction

Ruixing Cao, Akifumi Okuno, Kei Nakagawa, Hidetoshi Shimodaira

arXiv:2112.13951v21.9

Originality Incremental advance

AI Analysis

This work addresses a specific bias issue in nonparametric classification for applications like stock prediction, representing an incremental improvement over prior methods.

The paper tackles the asymptotic bias in nonparametric classification methods like local polynomial regression and multiscale k-NN when training samples are limited, proposing local radial regression (LRR) and its logistic variant (LRLR) to correct this bias by using radial distance as the explanatory variable. The result includes a proven convergence rate for LRR and experimental demonstrations, including on real-world stock indices datasets, showing that LRLR outperforms existing methods.

For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a \emph{local radial regression (LRR)} and its logistic regression variant called \emph{local radial logistic regression~(LRLR)}, by combining the advantages of LPoR and MS-$k$-NN. The idea is quite simple: we fit the local regression to observed labels by taking only the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. The usefulness of the proposed method is shown theoretically and experimentally. We prove the convergence rate of the $L^2$ risk for LRR with reference to MS-$k$-NN, and our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.

View on arXiv PDF

Similar