LG MLJun 3, 2024

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens

arXiv:2406.01435v16.42 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the performance limitations of kernel ridgeless regression for researchers in machine learning, offering a novel theoretical and practical improvement.

The paper tackles the lack of flexibility in kernel ridgeless regression by enhancing it with Locally-Adaptive-Bandwidths (LAB) RBF kernels, showing that the learned functions belong to an integral space of RKHSs and that optimization is equivalent to an ℓ₀-regularized problem, with experimental validation on synthetic and real datasets.

Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.

View on arXiv PDF Code

Similar