LG MLOct 24, 2025

Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions

Tobias Schmidt, Steffen Schneider, Matthias Bethge

arXiv:2510.21706v1h-index: 7

Originality Highly original

AI Analysis

This work addresses the challenge of general-purpose equivariant learning for AI/ML, particularly in computer vision, by enabling encoder-only learning from group action observations alone, including non-abelian groups, though it is incremental as it builds on existing contrastive and equivariant learning methods.

The paper tackles the problem of learning equivariant embeddings from unlabeled data pairs under finite group actions, proposing Equivariance by Contrast (EbC) to jointly learn a latent space and group representation without group-specific biases, achieving high-fidelity equivariance with group operations faithfully reproduced in latent space on synthetic datasets like dSprites, O(n), and GL(n).

We propose Equivariance by Contrast (EbC) to learn equivariant embeddings from observation pairs $(\mathbf{y}, g \cdot \mathbf{y})$, where $g$ is drawn from a finite group acting on the data. Our method jointly learns a latent space and a group representation in which group actions correspond to invertible linear maps -- without relying on group-specific inductive biases. We validate our approach on the infinite dSprites dataset with structured transformations defined by the finite group $G:= (R_m \times \mathbb{Z}_n \times \mathbb{Z}_n)$, combining discrete rotations and periodic translations. The resulting embeddings exhibit high-fidelity equivariance, with group operations faithfully reproduced in latent space. On synthetic data, we further validate the approach on the non-abelian orthogonal group $O(n)$ and the general linear group $GL(n)$. We also provide a theoretical proof for identifiability. While broad evaluation across diverse group types on real-world data remains future work, our results constitute the first successful demonstration of general-purpose encoder-only equivariant learning from group action observations alone, including non-trivial non-abelian groups and a product group motivated by modeling affine equivariances in computer vision.

View on arXiv PDF

Similar