LG CCMay 20, 2022

Tackling Provably Hard Representative Selection via Graph Neural Networks

Mehran Kazemi, Anton Tsitsulin, Hossein Esfandiari, MohammadHossein Bateni, Deepak Ramachandran, Bryan Perozzi, Vahab Mirrokni

arXiv:2205.10403v23.33 citationsh-index: 62Has Code

Originality Highly original

AI Analysis

This addresses a fundamental challenge in data selection for graph-based machine learning, offering a novel solution with broad applicability, though it builds on existing graph neural network methods.

The paper tackles the problem of selecting representative nodes from attributed graphs to optimize model accuracy, proving that a practical variant is hard to approximate without graph structure, but showing that with homophilous graphs, it becomes solvable using RS-GNN, which achieves significant improvements over baselines on eight benchmarks.

Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points.We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks.

View on arXiv PDF Code

Similar