Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness
This work provides theoretical insights into regularization trade-offs for practitioners in machine learning, though it is incremental as it refines existing analysis without introducing new methods.
The paper investigates the learning rates of multiple kernel learning with ℓ₁ and elastic-net regularizations in sparse settings, showing sharper convergence rates than previously known and revealing that elastic-net regularization achieves faster convergence when the ground truth is smooth, while ℓ₁ regularization is faster otherwise.
We investigate the learning rate of multiple kernel learning (MKL) with $\ell_1$ and elastic-net regularizations. The elastic-net regularization is a composition of an $\ell_1$-regularizer for inducing the sparsity and an $\ell_2$-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both $\ell_1$ and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.