IT LGFeb 3, 2020

Common Information Components Analysis

arXiv:2002.00779v33.38 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of multi-dataset feature extraction for researchers in machine learning and statistics, offering a theoretical foundation and algorithmic extension, though it appears incremental as it builds on existing CCA methods.

The paper tackles the problem of extracting common features from multiple high-dimensional datasets by providing an information-theoretic interpretation of Canonical Correlation Analysis (CCA) via Wyner's common information, resulting in a novel algorithm called Common Information Components Analysis (CICA) that extends to more than two datasets.

We give an information-theoretic interpretation of Canonical Correlation Analysis (CCA) via (relaxed) Wyner's common information. CCA permits to extract from two high-dimensional data sets low-dimensional descriptions (features) that capture the commonalities between the data sets, using a framework of correlations and linear transforms. Our interpretation first extracts the common information up to a pre-selected resolution level, and then projects this back onto each of the data sets. In the case of Gaussian statistics, this procedure precisely reduces to CCA, where the resolution level specifies the number of CCA components that are extracted. This also suggests a novel algorithm, Common Information Components Analysis (CICA), with several desirable features, including a natural extension to beyond just two data sets.

View on arXiv PDF

Similar