MLDec 6, 2016

Supervised topic models for clinical interpretability

Michael C. Hughes, Huseyin Melih Elibol, Thomas McCoy, Roy Perlis, Finale Doshi-Velez

arXiv:1612.01678v17.19 citations

Originality Synthesis-oriented

AI Analysis

This work addresses interpretable topic modeling for clinical researchers, but it appears incremental as it builds on existing sLDA methods with optimization improvements.

The paper tackled the problem of supervised Latent Dirichlet Allocation (sLDA) having negligible label influence and poor predictions due to conditional independence assumptions, by investigating penalized optimization methods with recognition networks for faster inference, and reported preliminary results on synthetic data and predicting successful anti-depressant medication from patient diagnostic history.

Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient's diagnostic history.

View on arXiv PDF

Similar