Learning relevant features for statistical inference
This work addresses feature relevance and inference challenges in multi-view data analysis, offering a method for improved statistical modeling and learning efficiency, though it appears incremental by building on DCCA and Schmidt decomposition concepts.
The paper tackles the problem of identifying features in one data view that can be reliably inferred from another, showing these are the most correlated variables via deep canonical correlation analysis (DCCA) and enabling non-parametric joint distribution representation for tasks like Bayesian inference. It demonstrates the approach on occluded MNIST images, revealing multimodal representations, and in supervised learning, it provides automatic regularization and faster convergence compared to cross-entropy.
Given two views of data, we consider the problem of finding the features of one view which can be most faithfully inferred from the other. We find that these are also the most correlated variables in the sense of deep canonical correlation analysis (DCCA). Moreover, we show that these variables can be used to construct a non-parametric representation of the implied joint probability distribution, which can be thought of as a classical version of the Schmidt decomposition of quantum states. This representation can be used to compute the expectations of functions over one view of data conditioned on the other, such as Bayesian estimators and their standard deviations. We test the approach using inference on occluded MNIST images, and show that our representation contains multiple modes. Surprisingly, when applied to supervised learning (one dataset consists of labels), this approach automatically provides regularization and faster convergence compared to the cross-entropy objective. We also explore using this approach to discover salient independent variables of a single dataset.