LGCLMLJul 12, 2018

Orthogonal Matching Pursuit for Text Classification

arXiv:1807.04715v21089 citationsHas Code
AI Analysis

This addresses the problem of balancing accuracy and sparsity in text classification for practitioners, though it is incremental as it adapts existing methods to this domain.

The paper tackles overfitting in text classification by applying Orthogonal Matching Pursuit (OMP) and an overlapping variant to produce sparse models, achieving effective regularization with high sparsity.

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models. Code and data are available online: https://github.com/y3nk0/OMP-for-Text-Classification .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes