LGAICLCVDec 4, 2020

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

arXiv:2012.02813v137 citations
Originality Highly original
AI Analysis

This work tackles the problem of efficient learning in low-resource modalities for multimodal AI systems, which is particularly relevant for scenarios where data collection is difficult or expensive.

This paper addresses the challenge of learning in low-resource modalities by proposing a meta-alignment method that enables models to quickly adapt to new tasks in a target modality, even when trained on a different source modality. The method demonstrates strong performance on three classification tasks (text to image, image to audio, and text to speech), even with only 1-10 labeled samples and noisy labels in the target modality.

The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i.e. meta-learning) and (2) doing so while being trained on a different source modality. We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities? Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data while ensuring quick generalization to new tasks across different modalities. We study this problem on 3 classification tasks: text to image, image to audio, and text to speech. Our results demonstrate strong performance even when the new target modality has only a few (1-10) labeled samples and in the presence of noisy labels, a scenario particularly prevalent in low-resource modalities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes