MLLGSTMay 23, 2024

Aligning Embeddings and Geometric Random Graphs: Informational Results and Computational Approaches for the Procrustes-Wasserstein Problem

arXiv:2405.14532v12 citationsh-index: 10NIPS
Originality Incremental advance
AI Analysis

This addresses a fundamental problem in matching noisy datasets for applications in natural language processing and computer vision, but appears incremental as it builds on existing models and methods.

The paper tackles the Procrustes-Wasserstein problem for aligning high-dimensional point clouds in an unsupervised setting, establishing information-theoretic bounds in different dimensional regimes and proposing the Ping-Pong algorithm that can retrieve the planted signal under certain conditions, with experimental comparisons to a state-of-the-art method.

The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision. We consider a planted model with two datasets $X,Y$ that consist of $n$ datapoints in $\mathbb{R}^d$, where $Y$ is a noisy version of $X$, up to an orthogonal transformation and a relabeling of the data points. This setting is related to the graph alignment problem in geometric models. In this work, we focus on the euclidean transport cost between the point clouds as a measure of performance for the alignment. We first establish information-theoretic results, in the high ($d \gg \log n$) and low ($d \ll \log n$) dimensional regimes. We then study computational aspects and propose the Ping-Pong algorithm, alternatively estimating the orthogonal transformation and the relabeling, initialized via a Franke-Wolfe convex relaxation. We give sufficient conditions for the method to retrieve the planted signal after one single step. We provide experimental results to compare the proposed approach with the state-of-the-art method of Grave et al. (2019).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes