CL AIJul 4, 2023

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

Weijie Xu, Xiaoyu Jiang, Jay Desai, Bin Han, Fuqin Yan, Francis Iannacci

Amazon

arXiv:2307.01878v21.35 citationsh-index: 9

Originality Incremental advance

AI Analysis

This provides a resource-efficient solution for text classification in settings with limited labeled data and computational resources, though it is incremental as it builds on existing topic modeling and knowledge distillation techniques.

The paper tackled the problem of text classification without requiring pretrained embeddings by developing KDSTM, a semi-supervised topic modeling method that leverages knowledge distillation, and it outperformed existing supervised topic modeling methods in accuracy, robustness, and efficiency while achieving similar performance to state-of-the-art weakly supervised methods.

In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.

View on arXiv PDF

Similar