LGOct 6, 2025

Provable Affine Identifiability of Nonlinear CCA under Latent Distributional Priors

arXiv:2510.04758v17.11 citationsh-index: 5

Originality Highly original

AI Analysis

This provides theoretical guarantees for nonlinear CCA's identifiability, addressing a foundational problem in unsupervised learning for researchers and practitioners in machine learning.

This paper establishes conditions under which nonlinear Canonical Correlation Analysis (CCA) can recover ground-truth latent factors up to an orthogonal transform after whitening, proving affine identifiability for a broad class of latent distributions in the population setting and extending these guarantees to finite samples via ridge-regularized empirical CCA.

In this work, we establish conditions under which nonlinear CCA recovers the ground-truth latent factors up to an orthogonal transform after whitening. Building on the classical result that linear mappings maximize canonical correlations under Gaussian priors, we prove affine identifiability for a broad class of latent distributions in the population setting. Central to our proof is a reparameterization result that transports the analysis from observation space to source space, where identifiability becomes tractable. We further show that whitening is essential for ensuring boundedness and well-conditioning, thereby underpinning identifiability. Beyond the population setting, we prove that ridge-regularized empirical CCA converges to its population counterpart, transferring these guarantees to the finite-sample regime. Experiments on a controlled synthetic dataset and a rendered image dataset validate our theory and demonstrate the necessity of its assumptions through systematic ablations.

View on arXiv PDF

Similar