Multi-Kernel LS-SVM Based Bio-Clinical Data Integration: Applications to Ovarian Cancer
This work addresses the problem of effectively utilizing multi-modal data for cancer research, specifically for ovarian cancer, but it is incremental as it applies an existing method to new data.
The authors tackled the challenge of integrating diverse bio-clinical data types for ovarian cancer by developing a multiple kernel LS-SVM pipeline, which resulted in higher log-rank statistics for patient stratification and improved accuracy in predicting clinical outcomes compared to using individual data types.
The medical research facilitates to acquire a diverse type of data from the same individual for particular cancer. Recent studies show that utilizing such diverse data results in more accurate predictions. The major challenge faced is how to utilize such diverse data sets in an effective way. In this paper, we introduce a multiple kernel based pipeline for integrative analysis of high-throughput molecular data (somatic mutation, copy number alteration, DNA methylation and mRNA) and clinical data. We apply the pipeline on Ovarian cancer data from TCGA. After multiple kernels have been generated from the weighted sum of individual kernels, it is used to stratify patients and predict clinical outcomes. We examine the survival time, vital status, and neoplasm cancer status of each subtype to verify how well they cluster. We have also examined the power of molecular and clinical data in predicting dichotomized overall survival data and to classify the tumor grade for the cancer samples. It was observed that the integration of various data types yields higher log-rank statistics value. We were also able to predict clinical status with higher accuracy as compared to using individual data types.