CV LGMay 5, 2019

On Exploring Undetermined Relationships for Visual Relationship Detection

Yibing Zhan, Jun Yu, Ting Yu, Dacheng Tao

arXiv:1905.01595v116.283 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in computer vision by leveraging unlabeled data for improved relationship detection, representing an incremental advance.

The paper tackles the problem of visual relationship detection by exploiting undetermined relationships from unlabeled data, achieving a top-50 relation detection recall improvement from 19.5% to 23.9% on the VRD dataset.

In visual relationship detection, human-notated relationships can be regarded as determinate relationships. However, there are still large amount of unlabeled data, such as object pairs with less significant relationships or even with no relationships. We refer to these unlabeled but potentially useful data as undetermined relationships. Although a vast body of literature exists, few methods exploit these undetermined relationships for visual relationship detection. In this paper, we explore the beneficial effect of undetermined relationships on visual relationship detection. We propose a novel multi-modal feature based undetermined relationship learning network (MF-URLN) and achieve great improvements in relationship detection. In detail, our MF-URLN automatically generates undetermined relationships by comparing object pairs with human-notated data according to a designed criterion. Then, the MF-URLN extracts and fuses features of object pairs from three complementary modals: visual, spatial, and linguistic modals. Further, the MF-URLN proposes two correlated subnetworks: one subnetwork decides the determinate confidence, and the other predicts the relationships. We evaluate the MF-URLN on two datasets: the Visual Relationship Detection (VRD) and the Visual Genome (VG) datasets. The experimental results compared with state-of-the-art methods verify the significant improvements made by the undetermined relationships, e.g., the top-50 relation detection recall improves from 19.5% to 23.9% on the VRD dataset.

View on arXiv PDF

Similar