CLMay 18, 2020

Interaction Matching for Long-Tail Multi-Label Classification

arXiv:2005.08805v15 citations
Originality Incremental advance
AI Analysis

This addresses the long-tail label issue in multi-label classification for domains like medical coding and software tutorials, offering an incremental enhancement to existing methods.

The paper tackled the problem of bias towards frequent labels in multi-label classification by incorporating interaction matching from ad-hoc search ranking, achieving up to an 11% relative improvement in macro performance, particularly for infrequent labels.

We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking. By performing soft n-gram interaction matching, we match labels with natural language descriptions (which are common to have in most multi-labeling tasks). Our approach can be used to enhance existing multi-label classification approaches, which are biased toward frequently-occurring labels. We evaluate our approach on two challenging tasks: automatic medical coding of clinical notes and automatic labeling of entities from software tutorial text. Our results show that our method can yield up to an 11% relative improvement in macro performance, with most of the gains stemming labels that appear infrequently in the training set (i.e., the long tail of labels).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes