Identifying Outliers using Influence Function of Multiple Kernel Canonical Correlation Analysis
This work addresses outlier detection in imaging genetics, which is incremental as it extends existing kernel methods to handle multiple datasets.
The authors tackled the problem of identifying outliers in imaging genetics and multi-source data by proposing an influence function for multiple kernel canonical correlation analysis and a visualization method for detecting influential observations. Experiments on synthetic and real data (SNP, fMRI, DNA methylation) showed the method effectively analyzes outliers in high-dimensional biomedical applications.
Imaging genetic research has essentially focused on discovering unique and co-association effects, but typically ignoring to identify outliers or atypical objects in genetic as well as non-genetics variables. Identifying significant outliers is an essential and challenging issue for imaging genetics and multiple sources data analysis. Therefore, we need to examine for transcription errors of identified outliers. First, we address the influence function (IF) of kernel mean element, kernel covariance operator, kernel cross-covariance operator, kernel canonical correlation analysis (kernel CCA) and multiple kernel CCA. Second, we propose an IF of multiple kernel CCA, which can be applied for more than two datasets. Third, we propose a visualization method to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, the proposed methods are capable of analyzing outliers of subjects usually found in biomedical applications, in which the number of dimension is large. To examine the outliers, we use the stem-and-leaf display. Experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) demonstrate that the proposed visualization can be applied effectively.