CLSep 2, 2022

Extend and Explain: Interpreting Very Long Language Models

arXiv:2209.01174v37 citationsh-index: 78
Originality Incremental advance
AI Analysis

This addresses the need for auditable and trustworthy machine learning models in medical applications where documents are very long, though it is incremental as it builds on existing explainability techniques.

The paper tackled the problem of explaining predictions from sparse attention language models for long text, particularly in medical contexts, by introducing a Masked Sampling Procedure (MSP) that identifies clinically informative text blocks 1.7x more effectively than previous methods and runs up to 100x faster.

While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies about 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation of MSP.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes