CVSep 23, 2023

Multi-modal Domain Adaptation for REG via Relation Transfer

arXiv:2309.13247v11.5h-index: 52

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited multi-modal data for REG, offering a domain adaptation method that is incremental over existing approaches.

The paper tackles multi-modal domain adaptation for Referring Expression Grounding (REG) by proposing a relation-transfer approach to improve knowledge transfer between domains, resulting in significant enhancements in adaptation performance.

Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficiently utilize the knowledge from different datasets (domains), is crucial for multi-modal tasks. In this paper, we focus on the Referring Expression Grounding (REG) task, which is to localize an image region described by a natural language expression. Specifically, we propose a novel approach to effectively transfer multi-modal knowledge through a specially relation-tailored approach for the REG problem. Our approach tackles the multi-modal domain adaptation problem by simultaneously enriching inter-domain relations and transferring relations between domains. Experiments show that our proposed approach significantly improves the transferability of multi-modal domains and enhances adaptation performance in the REG problem.

View on arXiv PDF

Similar