CV AI CL LGFeb 17, 2021

Learning Visual Models using a Knowledge Graph as a Trainer

Sebastian Monka, Lavdim Halilaj, Stefan Schmid, Achim Rettinger

arXiv:2102.08747v28.021 citationsh-index: 51

Originality Incremental advance

AI Analysis

This addresses the problem of domain shift in computer vision for applications like transfer learning, offering a more robust method, though it appears incremental as it builds on existing neuro-symbolic and contrastive learning techniques.

The paper tackles the problem of neural networks failing under domain shifts by proposing KG-NN, a neuro-symbolic approach that uses a knowledge graph as a trainer to supervise learning with image-data-invariant auxiliary knowledge. The results show that KG-NN outperforms cross-entropy-trained models in all experiments, particularly as the domain gap increases, with better performance and robustness to domain shifts.

Traditional computer vision approaches, based on neural networks (NN), are typically trained on a large amount of image data. By minimizing the cross-entropy loss between a prediction and a given class label, the NN and its visual embedding space are learned to fulfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a more robust NN to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the training using image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships, which is then transformed into a dense vector representation via an embedding method. Using a contrastive loss function, KG-NN learns to adapt its visual embedding space and thus its weights according to the image-data invariant knowledge graph embedding space. We evaluate KG-NN on visual transfer learning tasks for classification using the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides better performance and stronger robustness to domain shifts, these KG-NN adapts to multiple datasets and classes without suffering heavily from catastrophic forgetting.

View on arXiv PDF

Similar