LG CL NEJul 23, 2021

Text Classification and Clustering with Annealing Soft Nearest Neighbor Loss

arXiv:2107.14597v12 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving text classification and clustering performance for natural language processing tasks, presenting an incremental method based on disentanglement.

The paper tackles the problem of learning better natural language representations for text classification and clustering by maximizing disentanglement, which measures the separation between different classes relative to distances within the same class. This approach achieved a test classification accuracy of 90.11% and clustering accuracy of 88% on the AG News dataset, outperforming baseline models.

We define disentanglement as how far class-different data points from each other are, relative to the distances among class-similar data points. When maximizing disentanglement during representation learning, we obtain a transformed feature representation where the class memberships of the data points are preserved. If the class memberships of the data points are preserved, we would have a feature representation space in which a nearest neighbour classifier or a clustering algorithm would perform well. We take advantage of this method to learn better natural language representation, and employ it on text classification and text clustering tasks. Through disentanglement, we obtain text representations with better-defined clusters and improve text classification performance. Our approach had a test classification accuracy of as high as 90.11% and test clustering accuracy of 88% on the AG News dataset, outperforming our baseline models -- without any other training tricks or regularization.

View on arXiv PDF

Similar