LGCVIRMLJun 3, 2020

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

arXiv:2006.02330v24 citations
Originality Incremental advance
AI Analysis

This work addresses the overlooked issue of generalization in multi-modal nonlinear embeddings, which is important for researchers and practitioners in multi-modal learning, though it appears incremental as it builds on existing representation learning methods.

The paper tackles the problem of generalizing multi-modal nonlinear embeddings to unseen data by analyzing performance bounds and proposing an algorithm that jointly optimizes embeddings with Lipschitz regularity of interpolators. It shows promising results in multi-modal image classification and cross-modal image-text retrieval applications.

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes