NECVLGMMJan 27, 2020

aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption

arXiv:2001.09545v110 citations
AI Analysis

This addresses the problem of generating more accurate and correlated captions for image captioning systems, though it appears incremental by building on existing region-based approaches.

The paper tackles the problem of biased or uncorrelated captions in image captioning by proposing Attribute Interaction-Tensor Product Representation (aiTPR), which gathers information through orthogonal combinations and learns interactions as tensors. The method outperformed previous works on the MSCOCO dataset, with interaction portions contributing heavily to better caption quality.

Region visual features enhance the generative capability of the machines based on features, however they lack proper interaction attentional perceptions and thus ends up with biased or uncorrelated sentences or pieces of misinformation. In this work, we propose Attribute Interaction-Tensor Product Representation (aiTPR) which is a convenient way of gathering more information through orthogonal combination and learning the interactions as physical entities (tensors) and improving the captions. Compared to previous works, where features are added up to undefined feature spaces, TPR helps in maintaining sanity in combinations and orthogonality helps in defining familiar spaces. We have introduced a new concept layer that defines the objects and also their interactions that can play a crucial role in determination of different descriptions. The interaction portions have contributed heavily for better caption quality and has out-performed different previous works on this domain and MSCOCO dataset. We introduced, for the first time, the notion of combining regional image features and abstracted interaction likelihood embedding for image captioning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes