Nonparametric Teaching for Graph Property Learners
This work addresses efficiency issues in graph-structured data learning for applications like molecular property prediction, presenting an incremental improvement through a novel teaching framework.
The paper tackles the costly learning process for graph property learners like Graph Convolutional Networks (GCNs) by proposing Graph Neural Teaching (GraNT), a nonparametric teaching paradigm that selects graph-property pairs to accelerate training, resulting in significant reductions in training time (e.g., -36.62% for graph-level regression and -47.30% for node-level classification) while maintaining generalization performance.
Inferring properties of graph-structured data, e.g., the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Neural Teaching (GraNT) that reinterprets the learning process through a novel nonparametric teaching perspective. Specifically, the latter offers a theoretical framework for teaching implicitly defined (i.e., nonparametric) mappings via example selection. Such an implicit mapping is realized by a dense set of graph-property pairs, with the GraNT teacher selecting a subset of them to promote faster convergence in GCN training. By analytically examining the impact of graph structure on parameter-based gradient descent during training, and recasting the evolution of GCNs--shaped by parameter updates--through functional gradient descent in nonparametric teaching, we show for the first time that teaching graph property learners (i.e., GCNs) is consistent with teaching structure-aware nonparametric learners. These new findings readily commit GraNT to enhancing learning efficiency of the graph property learner, showing significant reductions in training time for graph-level regression (-36.62%), graph-level classification (-38.19%), node-level regression (-30.97%) and node-level classification (-47.30%), all while maintaining its generalization performance.