Multinomial belief networks for healthcare data
This addresses the problem of uncertainty quantification and pattern discovery in healthcare data for medical researchers, though it appears incremental as it builds on existing models.
The authors tackled the challenge of analyzing sparse, high-missingness healthcare data by proposing a deep generative Bayesian model for multinomial count data, which successfully identified biologically meaningful clusters of mutational signatures in cancer DNA data in a data-driven manner.
Healthcare data from patient or population cohorts are often characterized by sparsity, high missingness and relatively small sample sizes. In addition, being able to quantify uncertainty is often important in a medical context. To address these analytical requirements we propose a deep generative Bayesian model for multinomial count data. We develop a collapsed Gibbs sampling procedure that takes advantage of a series of augmentation relations, inspired by the Zhou$\unicode{x2013}$Cong$\unicode{x2013}$Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.