ML LGNov 28, 2023

Identifiable Feature Learning for Spatial Data with Nonlinear ICA

Hermanni Hälvä, Jonathan So, Richard E. Turner, Aapo Hyvärinen

arXiv:2311.16849v14.33 citationsh-index: 74

Originality Highly original

AI Analysis

This work addresses a bottleneck in representation learning for spatial data, offering a novel method with theoretical guarantees, though it is incremental in extending nonlinear ICA to higher-dimensional structures.

The paper tackles the limitation of existing nonlinear ICA methods to one-dimensional dependencies by introducing a new framework using t-process latent components for spatial and spatio-temporal data, achieving identifiability under general conditions and demonstrating applicability on simulated and real-world datasets.

Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory, practical nonlinear ICA algorithms have so far been mainly limited to data with one-dimensional latent dependencies, especially time-series data. In this paper, we introduce a new nonlinear ICA framework that employs $t$-process (TP) latent components which apply naturally to data with higher-dimensional dependency structures, such as spatial and spatio-temporal data. In particular, we develop a new learning and inference algorithm that extends variational inference methods to handle the combination of a deep neural network mixing function with the TP prior, and employs the method of inducing points for computational efficacy. On the theoretical side, we show that such TP independent components are identifiable under very general conditions. Further, Gaussian Process (GP) nonlinear ICA is established as a limit of the TP Nonlinear ICA model, and we prove that the identifiability of the latent components at this GP limit is more restricted. Namely, those components are identifiable if and only if they have distinctly different covariance kernels. Our algorithm and identifiability theorems are explored on simulated spatial data and real world spatio-temporal data.

View on arXiv PDF

Similar