Discovering Invariances in Healthcare Neural Networks
This work addresses the interpretability of neural networks in healthcare by revealing model invariances, which is incremental as it applies existing invariance learning methods to clinical data.
The authors tackled the problem of identifying input features that predictive models in healthcare ignore by learning invariant transformations that minimize changes in predictions, discovering that LSTM models on clinical time series and BioBERT on clinical notes are invariant to certain variables and words, especially under adversarial robustness.
We study the invariance characteristics of pre-trained predictive models by empirically learning transformations on the input that leave the prediction function approximately unchanged. To learn invariant transformations, we minimize the Wasserstein distance between the predictive distribution conditioned on the data instances and the predictive distribution conditioned on the transformed data instances. To avoid finding degenerate or perturbative transformations, we add a similarity regularization to discourage similarity between the data and its transformed values. We theoretically analyze the correctness of the algorithm and the structure of the solutions. Applying the proposed technique to clinical time series data, we discover variables that commonly-used LSTM models do not rely on for their prediction, especially when the LSTM is trained to be adversarially robust. We also analyze the invariances of BioBERT on clinical notes and discover words that it is invariant to.