CL LGOct 30, 2024

Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank

Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam

arXiv:2410.23066v21.0h-index: 42Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient attention learning in multi-label classification for domains such as healthcare and law, offering an incremental improvement through better initialization.

The paper tackles the challenge of learning good attention weights in Extreme Multi-Label Text Classification by introducing PLANT, a plug-and-play strategy that initializes attention using a pretrained Learning-to-Rank model, resulting in outperforming state-of-the-art methods across tasks like ICD coding and legal topic classification, with substantial improvements in few-shot settings and on rare labels.

State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant

View on arXiv PDF Code

Similar