CL AI LGJan 4, 2022

ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling

Alexandre Alcoforado, Thomas Palmeira Ferraz, Rodrigo Gerber, Enzo Bustos, André Seidel Oliveira, Bruno Miguel Veloso, Fabio Levy Siqueira, Anna Helena Reali Costa

arXiv:2201.01337v31.927 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient zero-shot classification in low-resource settings, though it is incremental as it builds on existing transformer-based methods.

The paper tackled the problem of zero-shot text classification for long texts and high execution times by proposing ZeroBERTo, which uses unsupervised clustering before classification, resulting in a 12% F1 score improvement over XLM-R on the FolhaUOL dataset.

Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset. Keywords: Low-Resource NLP, Unlabeled data, Zero-Shot Learning, Topic Modeling, Transformers.

View on arXiv PDF

Similar