MLLGSTOct 12, 2021

On the Self-Penalization Phenomenon in Feature Selection

arXiv:2110.05852v15 citations
Originality Incremental advance
AI Analysis

This addresses feature selection in machine learning, offering a novel approach that could improve model interpretability and efficiency, though it appears incremental relative to existing sparsity methods.

The paper tackles the problem of feature selection by introducing an implicit sparsity-inducing mechanism based on kernel minimization, which achieves exactly sparse stationary points with high probability without explicit sparsification techniques like penalization or early stopping.

We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{β, f}~\widehat{\mathbb{E}}[L(Y, f(β^{1/q} \odot X)] + λ_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~β\ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel $k_q(x, x') = h(\|x-x'\|_q^q)$, where $\|\cdot\|_q$ is the $\ell_q$ norm. Using gradient descent to optimize this objective with respect to $β$ leads to exactly sparse stationary points with high probability. The sparsity is achieved without using any of the well-known explicit sparsification techniques such as penalization (e.g., $\ell_1$), early stopping or post-processing (e.g., clipping). As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes