LGJun 22, 2022
Agent-based Graph Neural NetworksKarolis Martinkus, Pál András Papp, Benedikt Schesch et al. · eth-zurich
We present a novel graph neural network we call AgentNet, which is designed specifically for graph-level tasks. AgentNet is inspired by sublinear algorithms, featuring a computational complexity that is independent of the graph size. The architecture of AgentNet differs fundamentally from the architectures of traditional graph neural networks. In AgentNet, some trained \textit{neural agents} intelligently walk the graph, and then collectively decide on the output. We provide an extensive theoretical analysis of AgentNet: We show that the agents can learn to systematically explore their neighborhood and that AgentNet can distinguish some structures that are even indistinguishable by 2-WL. Moreover, AgentNet is able to separate any two graphs which are sufficiently different in terms of subgraphs. We confirm these theoretical results with synthetic experiments on hard-to-distinguish graphs and real-world graph classification tasks. In both cases, we compare favorably not only to standard GNNs but also to computationally more expensive GNN extensions.
29.4LGMay 22
Approaching I/O-optimality for Approximate AttentionPál András Papp, Aleksandros Sobczyk, Anastasios Zouzias
We revisit the I/O complexity of attention in large language models. Given query-key-value matrices $Q,K,V\in\mathbb{R}^{n\times d}$, and a machine with fast memory size $M$, the goal is to compute the "attention matrix" $A=\text{softmax}(Q K ^{\top}/\sqrt{d}) V$ with the minimal number of data transfers between fast and slow memory. Existing methods in the literature, most notably FlashAttention and its variants, incur an I/O cost that depends quadratically on $n$, while a trivial lower bound only requires $Ω(nd)$ I/O's to read the inputs and write the output. In this work, we present a technique for computing attention where the I/O cost only depends almost-linearly on $n$ in most parameter regimes. This is achieved by developing I/O-efficient algorithms inspired by the recent approximate attention framework of Alman and Song. We also prove corresponding lower bounds in each parameter regime to show that our algorithms are indeed close to I/O-optimal.
39.6DCApr 30
Replication in Graph Partitioning and Scheduling ProblemsPál András Papp, Toni Böhnlein, A. N. Yzelman
The efficient parallel execution of complex computations requires balancing the workload across processors while minimizing the communication between them. This inherent trade-off is often captured by graph partitioning or DAG scheduling problems. For the sake of model simplicity, most works on these problems assume that nodes can be assigned to only a single processor. However, in reality, replicating an operation on several processors can easily be beneficial: it may increase the computational costs only by a small amount, while significantly reducing the communication requirements. Our goal is to provide a comprehensive analysis of the impact of replication on partitioning and scheduling problems. On the theoretical side, we show that for graph partitioning, replication makes the problem significantly harder in terms of approximation complexity, whereas for scheduling, its impact on complexity seems less prominent. On the experimental side, we conduct a thorough analysis of the cost reduction obtainable by replication, on a wide range of graphs from real-world applications. For hypergraph partitioning, we use Integer Linear Programming (ILP) formulations to compare the optimal costs; our experiments show that replication can reduce the cost by 17%-65% on average, or even entirely remove the need for communication in some cases. For DAG scheduling, we similarly use ILPs on smaller graphs, and develop a sophisticated heuristic that is also applicable to much larger workloads. Our experiments here demonstrate a mean cost reduction of 11.61%-23.13% with replication, or even up to 58.17% in some cases.
LGJan 30, 2022
A Theoretical Comparison of Graph Neural Network ExtensionsPál András Papp, Roger Wattenhofer
We study and compare different Graph Neural Network extensions that increase the expressive power of GNNs beyond the Weisfeiler-Leman test. We focus on (i) GNNs based on higher order WL methods, (ii) GNNs that preprocess small substructures in the graph, (iii) GNNs that preprocess the graph up to a small radius, and (iv) GNNs that slightly perturb the graph to compute an embedding. We begin by presenting a simple improvement for this last extension that strictly increases the expressive power of this GNN variant. Then, as our main result, we compare the expressiveness of these extensions to each other through a series of example constructions that can be distinguished by one of the extensions, but not by another one. We also show negative examples that are particularly challenging for each of the extensions, and we prove several claims about the ability of these extensions to count cliques and cycles in the graph.
LGNov 11, 2021
DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural NetworksPál András Papp, Karolis Martinkus, Lukas Faber et al.
This paper studies Dropout Graph Neural Networks (DropGNNs), a new approach that aims to overcome the limitations of standard GNN frameworks. In DropGNNs, we execute multiple runs of a GNN on the input graph, with some of the nodes randomly and independently dropped in each of these runs. Then, we combine the results of these runs to obtain the final result. We prove that DropGNNs can distinguish various graph neighborhoods that cannot be separated by message passing GNNs. We derive theoretical bounds for the number of runs required to ensure a reliable distribution of dropouts, and we prove several properties regarding the expressive capabilities and limits of DropGNNs. We experimentally validate our theoretical findings on expressiveness. Furthermore, we show that DropGNNs perform competitively on established GNN benchmarks.