A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
This work addresses the integration of knowledge discovery and pre-training in multi-modal machine learning, which is incremental as it combines existing concepts into a new iterative framework.
The paper tackles the problem of linking knowledge discovery with knowledge-guided multi-modal pre-training by proposing a unified continuous learning framework that iteratively improves both tasks, achieving validated effectiveness on MS-COCO and Flickr30K datasets.
Multi-modal pre-training and knowledge discovery are two important research topics in multi-modal machine learning. Nevertheless, none of existing works make attempts to link knowledge discovery with knowledge guided multi-modal pre-training. In this paper, we propose to unify them into a continuous learning framework for mutual improvement. Taking the open-domain uni-modal datasets of images and texts as input, we maintain a knowledge graph as the foundation to support these two tasks. For knowledge discovery, a pre-trained model is used to identify cross-modal links on the graph. For model pre-training, the knowledge graph is used as the external knowledge to guide the model updating. These two steps are iteratively performed in our framework for continuous learning. The experimental results on MS-COCO and Flickr30K with respect to both knowledge discovery and the pre-trained model validate the effectiveness of our framework.