Debjyoti Saha Roy

h-index42
2papers

2 Papers

28.8CLJun 5
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam

Modern reasoning models offer surprisingly strong zero-shot performance on challenging multi-label tasks that require selecting a small set of relevant options from hundreds of thousands to millions of candidate labels. We investigate how they achieve this mechanistically. We characterize reasoning as a two-phase process: A broad "shortlisting" of candidates followed by fine-grained reasoning over the resulting set. We provide evidence across a range of datasets that these steps can be isolated and are complementary. Using this characterization, we develop a mechanistic distillation strategy that consistently outperforms standard distillation.

CLOct 30, 2024Code
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank

Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam

State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant