CLAug 29, 2019

Probing Representations Learned by Multimodal Recurrent and Transformer Models

arXiv:1908.11125v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of understanding and improving sentence representations for researchers in natural language processing and multimodal learning, though it is incremental in comparing existing architectures.

The study compared how recurrent and transformer models learn sentence representations from different training signals, finding that target language or visual modalities provide stronger training signals than language modeling, and that recurrent models perform better on semantic relevance tasks despite transformers excelling in translation.

Recent literature shows that large-scale language modeling provides excellent reusable sentence representations with both recurrent and self-attentive architectures. However, there has been less clarity on the commonalities and differences in the representational properties induced by the two architectures. It also has been shown that visual information serves as one of the means for grounding sentence representations. In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation. We evaluate textual and visual features of sentence representations obtained using predominant approaches on image retrieval and semantic textual similarity. Our experiments reveal that on moderate-sized datasets, a sentence counterpart in a target language or visual modality provides much stronger training signal for sentence representation than language modeling. Importantly, we observe that while the Transformer models achieve superior machine translation quality, representations from the recurrent neural network based models perform significantly better over tasks focused on semantic relevance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes