Generalized Canonical Correlation Analysis for Disparate Data Fusion
This work addresses the problem of enabling joint inference from multiple disparate data sources for researchers and practitioners in data fusion, but it appears incremental as it focuses on efficiency analysis of existing methods.
The paper investigates the efficiency of Canonical Correlation Analysis (CCA) and Generalized Canonical Correlation Analysis (GCCA) for manifold matching in disparate data fusion, focusing on a text document classification task under various training conditions.
Manifold matching works to identify embeddings of multiple disparate data spaces into the same low-dimensional space, where joint inference can be pursued. It is an enabling methodology for fusion and inference from multiple and massive disparate data sources. In this paper we focus on a method called Canonical Correlation Analysis (CCA) and its generalization Generalized Canonical Correlation Analysis (GCCA), which belong to the more general Reduced Rank Regression (RRR) framework. We present an efficiency investigation of CCA and GCCA under different training conditions for a particular text document classification task.