Iteratively reweighted kernel machines efficiently learn sparse functions
This work addresses the challenge of efficient sparse and hierarchical learning for machine learning practitioners, offering an incremental improvement by applying kernel methods in a novel iterative way.
The paper tackles the problem of learning sparse functions and hierarchical polynomials by showing that classical kernel methods can detect influential coordinates and learn hierarchical structures efficiently, with numerical experiments supporting the theory.
The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influential coordinates with low sample complexity. Moreover, by iteratively using the derivatives to reweight the data and retrain kernel machines, one is able to efficiently learn hierarchical polynomials with finite leap complexity. Numerical experiments illustrate the developed theory.