MLLGAPMEApr 15, 2019

Multiple kernel learning for integrative consensus clustering of 'omic datasets

arXiv:1904.07701v43 citations
Originality Incremental advance
AI Analysis

This work addresses the need for robust integrative clustering methods in biomedical research, such as tumour subtyping, but is incremental as it builds on existing approaches like COCA.

The authors tackled the problem of integrating multiple datasets for clustering, particularly in cancer subtyping, by benchmarking the existing COCA method and proposing KLIC as an alternative that uses multiple kernel learning to weight datasets, showing improved robustness in simulations and real data applications.

Diverse applications - particularly in tumour subtyping - have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster-Of-Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets, or datasets that define conflicting clustering structures, is unclear. We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. R packages "klic" and "coca" are available on the Comprehensive R Archive Network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes