Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis
This work addresses genetic data analysis for identifying disease risk factors, but it appears incremental as it builds on existing methods like latent Dirichlet allocation.
The researchers tackled the problem of analyzing large, sparse genetic data to identify risk factors for cancers and autism by extending latent Dirichlet allocation to multiple dimensions with hierarchical topic modeling, resulting in models that were more coherent than baselines.
We analyze large, multi-dimensional, sparse counting data sets, finding unsupervised groups to provide unique insights into genetic data. We create gene and biological pathway groups based on patients' variants to find common risk factors for four common types of cancer (breast, lung, prostate, and colorectal) and autism spectrum disorder. To accomplish this, we extend latent Dirichlet allocation to multiple dimensions and design distinct methods for hierarchical topic modeling. We find that our conditional hierarchical Bayesian Tucker decomposition models are more coherent than baseline models.