LGCCMay 20, 2022

Tackling Provably Hard Representative Selection via Graph Neural Networks

arXiv:2205.10403v23 citationsh-index: 62
Originality Highly original
AI Analysis

This addresses a fundamental challenge in data selection for graph-based machine learning, offering a novel solution with broad applicability, though it builds on existing graph neural network methods.

The paper tackles the problem of selecting representative nodes from attributed graphs to optimize model accuracy, proving that a practical variant is hard to approximate without graph structure, but showing that with homophilous graphs, it becomes solvable using RS-GNN, which achieves significant improvements over baselines on eight benchmarks.

Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points.We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes