CLDec 11, 2022

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

arXiv:2212.05506v2291 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the time and data efficiency issues in weakly-supervised text classification for researchers and practitioners, though it is incremental as it builds on existing keyword-driven methods.

The paper tackles the problem of weakly-supervised text classification by proposing FastClass, which uses dense text representation to retrieve class-relevant documents from an external corpus, reducing reliance on initial class descriptions and achieving faster training speeds, with experiments showing it often outperforms keyword-driven models in accuracy and is orders-of-magnitude faster.

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords. Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes