CLJul 27, 2023

Gzip versus bag-of-words for text classification

arXiv:2307.15002v52 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This addresses text classification efficiency for researchers and practitioners, but it is incremental as it compares existing methods.

The paper tackled the problem of text classification by comparing compression-based methods (gzip) with bag-of-words approaches, finding that bag-of-words can achieve similar or better results and is more efficient.

The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention. In this note we show that `bag-of-words' approaches can achieve similar or better results, and are more efficient.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes