Interpreting multi-variate models with setPCA
This work addresses the interpretability challenge for researchers analyzing omics data, but it is incremental as it builds on existing PCA methods with a new interface and algorithm.
The paper tackles the problem of interpreting multi-variate models like PCA in omics data by developing a method to integrate model information with background knowledge databases, resulting in a GUI tool that overlays known sets onto loadings plots to improve interpretability.
Principal Component Analysis (PCA) and other multi-variate models are often used in the analysis of "omics" data. These models contain much information which is currently neither easily accessible nor interpretable. Here we present an algorithmic method which has been developed to integrate this information with existing databases of background knowledge, stored in the form of known sets (for instance genesets or pathways). To make this accessible we have produced a Graphical User Interface (GUI) in Matlab which allows the overlay of known set information onto the loadings plot and thus improves the interpretability of the multi-variate model. For each known set the optimal convex hull, covering a subset of elements from the known set, is found through a search algorithm and displayed. In this paper we discuss two main topics; the details of the search algorithm for the optimal convex hull for this problem and the GUI interface which is freely available for download for academic use.