LGSYMLMar 27, 2018

Canonical Correlation Analysis of Datasets with a Common Source Graph

arXiv:1803.10309v131 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for researchers and practitioners in machine learning, particularly in dimensionality reduction and data fusion tasks.

The paper tackled the problem of standard CCA not exploiting the geometry of common sources by proposing a graph-regularized CCA (gCCA) that encodes this information as a graph regularizer, resulting in improved performance in image classification tests on real datasets.

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes