LGAISep 30, 2022

Relative representations enable zero-shot latent space communication

Oxford
arXiv:2209.15430v2181 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses the issue of latent space incompatibility for researchers and practitioners in machine learning, enabling reuse and comparison of models without additional training, though it is incremental as it builds on existing representation learning methods.

The paper tackles the problem of incoherent latent spaces in neural networks caused by random initialization and training variations, proposing relative representations based on angles between encodings to achieve invariance without retraining. The result demonstrates zero-shot model stitching and latent space comparison across diverse datasets, modalities, and architectures, validated extensively.

Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should depend only on the task, the data, the loss, and other architecture-specific constraints. However, factors such as the random weights initialization, training hyperparameters, or other sources of randomness in the training phase may induce incoherent latent spaces that hinder any form of reuse. Nevertheless, we empirically observe that, under the same data and modeling choices, the angles between the encodings within distinct latent spaces do not change. In this work, we propose the latent similarity between each sample and a fixed set of anchors as an alternative data representation, demonstrating that it can enforce the desired invariances without any additional training. We show how neural architectures can leverage these relative representations to guarantee, in practice, invariance to latent isometries and rescalings, effectively enabling latent space communication: from zero-shot model stitching to latent space comparison between diverse settings. We extensively validate the generalization capability of our approach on different datasets, spanning various modalities (images, text, graphs), tasks (e.g., classification, reconstruction) and architectures (e.g., CNNs, GCNs, transformers).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes