LGCYMLJul 9, 2020

Graph Convolutional Networks for Graphs Containing Missing Features

arXiv:2007.04583v2133 citations
AI Analysis

This addresses a practical issue for graph analysis in real-world applications where data is often incomplete, offering a more robust and efficient solution compared to traditional separated approaches.

The paper tackles the problem of missing node features in Graph Convolutional Networks (GCNs) by integrating missing feature processing and graph learning into a single neural network, using a Gaussian Mixture Model to handle missing data. It demonstrates significant performance improvements over imputation-based methods in node classification and link prediction, even outperforming GCNs with complete features at low missing levels.

Graph Convolutional Network (GCN) has experienced great success in graph analysis tasks. It works by smoothing the node features across the graph. The current GCN models overwhelmingly assume that the node feature information is complete. However, real-world graph data are often incomplete and containing missing features. Traditionally, people have to estimate and fill in the unknown features based on imputation techniques and then apply GCN. However, the process of feature filling and graph learning are separated, resulting in degraded and unstable performance. This problem becomes more serious when a large number of features are missing. We propose an approach that adapts GCN to graphs containing missing features. In contrast to traditional strategy, our approach integrates the processing of missing features and graph learning within the same neural network architecture. Our idea is to represent the missing data by Gaussian Mixture Model (GMM) and calculate the expected activation of neurons in the first hidden layer of GCN, while keeping the other layers of the network unchanged. This enables us to learn the GMM parameters and network weight parameters in an end-to-end manner. Notably, our approach does not increase the computational complexity of GCN and it is consistent with GCN when the features are complete. We demonstrate through extensive experiments that our approach significantly outperforms the imputation-based methods in node classification and link prediction tasks. We show that the performance of our approach for the case with a low level of missing features is even superior to GCN for the case with complete features.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes