Stratification of patient trajectories using covariate latent variable models
This work addresses the challenge of accurately modeling disease progression in complex biological systems for medical researchers, though it appears incremental as it builds on existing latent variable models with a novel extension for handling covariates.
The authors tackled the problem of learning continuous disease progression scores from dynamic omics data, which can be confounded by clinical covariates, by introducing covariate latent variable models. They applied this model to TCGA colorectal cancer RNA-seq data, demonstrating its ability to identify genes that stratify patients on an immune-response trajectory.
Standard models assign disease progression to discrete categories or stages based on well-characterized clinical markers. However, such a system is potentially at odds with our understanding of the underlying biology, which in highly complex systems may support a (near-)continuous evolution of disease from inception to terminal state. To learn such a continuous disease score one could infer a latent variable from dynamic "omics" data such as RNA-seq that correlates with an outcome of interest such as survival time. However, such analyses may be confounded by additional data such as clinical covariates measured in electronic health records (EHRs). As a solution to this we introduce covariate latent variable models, a novel type of latent variable model that learns a low-dimensional data representation in the presence of two (asymmetric) views of the same data source. We apply our model to TCGA colorectal cancer RNA-seq data and demonstrate how incorporating microsatellite-instability (MSI) status as an external covariate allows us to identify genes that stratify patients on an immune-response trajectory. Finally, we propose an extension termed Covariate Gaussian Process Latent Variable Models for learning nonparametric, nonlinear representations. An R package implementing variational inference for covariate latent variable models is available at http://github.com/kieranrcampbell/clvm.