CLApr 7, 2020

Exemplar Auditing for Multi-Label Biomedical Text Classification

arXiv:2004.03093v18 citations
AI Analysis

This provides a tool for healthcare workers to understand model predictions in noisy, high-dimensional text data, but it is incremental as it builds on an existing method.

The authors tackled the problem of semi-supervised discovery in biomedical text classification by generalizing a zero-shot sequence labeling method to handle high-dimensional document-level labels, achieving competitive effectiveness on a MIMIC-III multi-label classification task.

Many practical applications of AI in medicine consist of semi-supervised discovery: The investigator aims to identify features of interest at a resolution more fine-grained than that of the available human labels. This is often the scenario faced in healthcare applications as coarse, high-level labels (e.g., billing codes) are often the only sources that are readily available. These challenges are compounded for modalities such as text, where the feature space is very high-dimensional, and often contains considerable amounts of noise. In this work, we generalize a recently proposed zero-shot sequence labeling method, "binary labeling via a convolutional decomposition", to the case where the available document-level human labels are themselves relatively high-dimensional. The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors from the training set, under the model. The approach is effective, yet parsimonious, as demonstrated on a well-studied MIMIC-III multi-label classification task of electronic health record data, and is useful as a tool for organizing the analysis of neural model predictions and high-dimensional datasets. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes