IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
This work addresses the challenge of aligning entities across multi-modal knowledge graphs, which is crucial for integrating diverse data sources in AI applications, though it appears incremental as it builds on existing fusion methods by explicitly handling redundancy.
The paper tackles the problem of multi-modal entity alignment (MMEA) by using a variational information bottleneck approach to suppress irrelevant information and enhance alignment-relevant features in entity representations, resulting in consistent outperformance of state-of-the-art methods across multiple datasets, including in low-resource and high-noise scenarios.
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios.