CLAILGOct 26, 2022

OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for Extreme Multi-label Text Classification

arXiv:2210.14523v2292 citationsh-index: 2Has Code
Originality Highly original
AI Analysis

This addresses the challenge of accurately predicting unordered label sets in large-scale text classification, which is incremental over existing Seq2Seq methods.

The paper tackles the problem of predicting variable-length, unordered label subsets in extreme multi-label text classification by proposing OTSeq2Set, an autoregressive sequence-to-set model that uses optimal transport and bipartite matching for permutation-invariant training. It outperforms baselines on 4 benchmark datasets, achieving a 16.34% improvement in micro-F1 score over the state-of-the-art Seq2Seq method on the Wikipedia dataset with 31k labels.

Extreme multi-label text classification (XMTC) is the task of finding the most relevant subset labels from an extremely large-scale label collection. Recently, some deep learning models have achieved state-of-the-art results in XMTC tasks. These models commonly predict scores for all labels by a fully connected layer as the last layer of the model. However, such models can't predict a relatively complete and variable-length label subset for each document, because they select positive labels relevant to the document by a fixed threshold or take top k labels in descending order of scores. A less popular type of deep learning models called sequence-to-sequence (Seq2Seq) focus on predicting variable-length positive labels in sequence style. However, the labels in XMTC tasks are essentially an unordered set rather than an ordered sequence, the default order of labels restrains Seq2Seq models in training. To address this limitation in Seq2Seq, we propose an autoregressive sequence-to-set model for XMTC tasks named OTSeq2Set. Our model generates predictions in student-forcing scheme and is trained by a loss function based on bipartite matching which enables permutation-invariance. Meanwhile, we use the optimal transport distance as a measurement to force the model to focus on the closest labels in semantic label space. Experiments show that OTSeq2Set outperforms other competitive baselines on 4 benchmark datasets. Especially, on the Wikipedia dataset with 31k labels, it outperforms the state-of-the-art Seq2Seq method by 16.34% in micro-F1 score. The code is available at https://github.com/caojie54/OTSeq2Set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes