CLJul 27, 2023
Gzip versus bag-of-words for text classification
arXiv:2307.15002v52 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis
This addresses text classification efficiency for researchers and practitioners, but it is incremental as it compares existing methods.
The paper tackled the problem of text classification by comparing compression-based methods (gzip) with bag-of-words approaches, finding that bag-of-words can achieve similar or better results and is more efficient.
The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention. In this note we show that `bag-of-words' approaches can achieve similar or better results, and are more efficient.