CLLGJun 24, 2021

Evaluation of Representation Models for Text Classification with AutoML Tools

arXiv:2106.12798v2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of applying AutoML to unstructured text data, but it is incremental as it benchmarks existing methods without introducing new techniques.

The study compared manually created text representations with AutoML-generated embeddings for text classification, finding that simple manual representations outperformed AutoML tools across eight datasets.

Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes