End-to-End Learning on Multimodal Knowledge Graphs
This work addresses the limitation of existing knowledge graph models that ignore raw multimodal data, offering a more comprehensive learning approach for data scientists, though it is incremental as it builds on existing message passing networks.
The authors tackled the problem of end-to-end learning on knowledge graphs by incorporating multimodal node features, such as numbers, texts, and images, into a unified model, achieving significant performance improvements in node classification and link prediction tasks, with results showing that including multimodal information can boost performance depending on data characteristics.
Knowledge graphs enable data scientists to learn end-to-end on heterogeneous knowledge. However, most end-to-end models solely learn from the relational information encoded in graphs' structure: raw values, encoded as literal nodes, are either omitted completely or treated as regular nodes without consideration for their values. In either case we lose potentially relevant information which could have otherwise been exploited by our learning methods. We propose a multimodal message passing network which not only learns end-to-end from the structure of graphs, but also from their possibly divers set of multimodal node features. Our model uses dedicated (neural) encoders to naturally learn embeddings for node features belonging to five different types of modalities, including numbers, texts, dates, images and geometries, which are projected into a joint representation space together with their relational information. We implement and demonstrate our model on node classification and link prediction for artificial and real-worlds datasets, and evaluate the effect that each modality has on the overall performance in an inverse ablation study. Our results indicate that end-to-end multimodal learning from any arbitrary knowledge graph is indeed possible, and that including multimodal information can significantly affect performance, but that much depends on the characteristics of the data.