Prediction-Constrained Topic Models for Antidepressant Recommendation
This work addresses a specific bottleneck in clinical data analysis by improving interpretability and accuracy for antidepressant recommendation, though it is incremental in nature.
The authors tackled the problem of balancing generative and predictive goals in supervised topic models for clinical tasks, achieving improved antidepressant recommendations from electronic health records compared to existing methods.
Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended task is always predicting labels from data, not data from labels. Our new prediction-constrained objective trains models that predict labels from heldout data well while also producing good generative likelihoods and interpretable topic-word parameters. In a case study on predicting depression medications from electronic health records, we demonstrate improved recommendations compared to previous supervised topic models and high- dimensional logistic regression from words alone.