Deep Generalized Canonical Correlation Analysis
This work addresses the need for nonlinear multiview representation learning in domains like speech processing and social media, offering a novel method that combines deep learning with many-view integration, though it is incremental in extending prior CCA approaches.
The authors tackled the problem of learning nonlinear transformations from multiple data views to maximize mutual information, introducing Deep Generalized Canonical Correlation Analysis (DGCCA). They demonstrated that DGCCA outperforms existing methods in phonetic transcription and hashtag recommendation tasks, with sound improvements over linear techniques.
We present Deep Generalized Canonical Correlation Analysis (DGCCA) -- a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA is the first CCA-style multiview representation learning technique that combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many independent sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn DGCCA representations on two distinct datasets for three downstream tasks: phonetic transcription from acoustic and articulatory measurements, and recommending hashtags and friends on a dataset of Twitter users. We find that DGCCA representations soundly beat existing methods at phonetic transcription and hashtag recommendation, and in general perform no worse than standard linear many-view techniques.