MLLGMEJun 23, 2024

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

arXiv:2406.16227v22 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for computationally efficient clustering algorithms in precision medicine for biomedical researchers, though it appears incremental as it builds on existing variational inference methods with enhancements like variable selection and model averaging.

The authors tackled the problem of clustering high-dimensional categorical biomedical data by developing VICatMix, a variational Bayesian finite mixture model that outperforms competitors in efficiency while maintaining high accuracy, as demonstrated on simulated and real-world datasets like TCGA for cancer subtyping and driver gene discovery.

Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in term of efficiency, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarisation and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas (TCGA), showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's utility in integrative cluster analysis with different `omics datasets, enabling the discovery of novel subtypes. \textbf{Availability:} VICatMix is freely available as an R package, incorporating C++ for faster computation, at https://github.com/j-ackierao/VICatMix.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes