LGFeb 14, 2023
Understanding Oversquashing in GNNs through the Lens of Effective ResistanceMitchell Black, Zhengchao Wan, Amir Nayyeri et al.
Message passing graph neural networks (GNNs) are a popular learning architectures for graph-structured data. However, one problem GNNs experience is oversquashing, where a GNN has difficulty sending information between distant nodes. Understanding and mitigating oversquashing has recently received significant attention from the research community. In this paper, we continue this line of work by analyzing oversquashing through the lens of the effective resistance between nodes in the input graph. Effective resistance intuitively captures the ``strength'' of connection between two nodes by paths in the graph, and has a rich literature spanning many areas of graph theory. We propose to use total effective resistance as a bound of the total amount of oversquashing in a graph and provide theoretical justification for its use. We further develop an algorithm to identify edges to be added to an input graph to minimize the total effective resistance, thereby alleviating oversquashing. We provide empirical evidence of the effectiveness of our total effective resistance based rewiring strategies for improving the performance of GNNs.
LGJan 27, 2023
On the Connection Between MPNN and Graph TransformerChen Cai, Truong Son Hy, Rose Yu et al.
Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work (Kim et al., 2022) shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer (Choromanski et al., 2020; Katharopoulos et al., 2020), then MPNN + VN with only O(1) depth and O(1) width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN + VN and DeepSets, we prove the MPNN + VN with O(n^d) width and O(1) depth can approximate the self-attention layer arbitrarily well, where d is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN + VN with O(1) width and O(n) depth approximating the self-attention layer in GT arbitrarily well. On the empirical side, we demonstrate that 1) MPNN + VN is a surprisingly strong baseline, outperforming GT on the recently proposed Long Range Graph Benchmark (LRGB) dataset, 2) our MPNN + VN improves over early implementation on a wide range of OGB datasets and 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task.
LGOct 31, 2022
The Numerical Stability of Hyperbolic Representation LearningGal Mishne, Zhengchao Wan, Yusu Wang et al.
Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincaré ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincaré ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincaré ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
MLJan 6, 2023
Principal Component Analysis in Space FormsPuoya Tabaghi, Michael Khanzadeh, Yusu Wang et al.
Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.
LGNov 7, 2022
Implicit Graphon Neural RepresentationXinyue Xia, Gal Mishne, Yusu Wang
Graphons are general and powerful models for generating graphs of varying size. In this paper, we propose to directly model graphons using neural networks, obtaining Implicit Graphon Neural Representation (IGNR). Existing work in modeling and reconstructing graphons often approximates a target graphon by a fixed resolution piece-wise constant representation. Our IGNR has the benefit that it can represent graphons up to arbitrary resolutions, and enables natural and efficient generation of arbitrary sized graphs with desired structure once the model is learned. Furthermore, we allow the input graph data to be unaligned and have different sizes by leveraging the Gromov-Wasserstein distance. We first demonstrate the effectiveness of our model by showing its superior performance on a graphon learning task. We then propose an extension of IGNR that can be incorporated into an auto-encoder framework, and demonstrate its good performance under a more general setting of graphon learning. We also show that our model is suitable for graph representation learning and graph generation.
LGFeb 24Code
MINAR: Mechanistic Interpretability for Neural Algorithmic ReasoningJesse He, Helen Jenne, Max Vargas et al.
The recent field of neural algorithmic reasoning (NAR) studies the ability of graph neural networks (GNNs) to emulate classical algorithms like Bellman-Ford, a phenomenon known as algorithmic alignment. At the same time, recent advances in large language models (LLMs) have spawned the study of mechanistic interpretability, which aims to identify granular model components like circuits that perform specific computations. In this work, we introduce Mechanistic Interpretability for Neural Algorithmic Reasoning (MINAR), an efficient circuit discovery toolbox that adapts attribution patching methods from mechanistic interpretability to the GNN setting. We show through two case studies that MINAR recovers faithful neuron-level circuits from GNNs trained on algorithmic tasks. Our study sheds new light on the process of circuit formation and pruning during training, as well as giving new insight into how GNNs trained to perform multiple tasks in parallel reuse circuit components for related tasks. Our code is available at https://github.com/pnnl/MINAR.
LGFeb 1, 2023
The Weisfeiler-Lehman Distance: Reinterpretation and Connection with GNNsSamantha Chen, Sunhyuk Lim, Facundo Mémoli et al.
In this paper, we present a novel interpretation of the so-called Weisfeiler-Lehman (WL) distance, introduced by Chen et al. (2022), using concepts from stochastic processes. The WL distance aims at comparing graphs with node features, has the same discriminative power as the classic Weisfeiler-Lehman graph isomorphism test and has deep connections to the Gromov-Wasserstein distance. This new interpretation connects the WL distance to the literature on distances for stochastic processes, which also makes the interpretation of the distance more accessible and intuitive. We further explore the connections between the WL distance and certain Message Passing Neural Networks, and discuss the implications of the WL distance for understanding the Lipschitz property and the universal approximation results for these networks.
LGOct 21, 2022
Learning Ultrametric Trees for Optimal Transport RegressionSamantha Chen, Puoya Tabaghi, Yusu Wang
Optimal transport provides a metric which quantifies the dissimilarity between probability measures. For measures supported in discrete metric spaces, finding the optimal transport distance has cubic time complexity in the size of the space. However, measures supported on trees admit a closed-form optimal transport that can be computed in linear time. In this paper, we aim to find an optimal tree structure for a given discrete metric space so that the tree-Wasserstein distance approximates the optimal transport distance in the original space. One of our key ideas is to cast the problem in ultrametric spaces. This helps us optimize over the space of ultrametric trees -- a mixed-discrete and continuous optimization problem -- via projected gradient decent over the space of ultrametric matrices. During optimization, we project the parameters to the ultrametric space via a hierarchical minimum spanning tree algorithm, equivalent to the closest projection to ultrametrics under the supremum norm. Experimental results on real datasets show that our approach outperforms previous approaches (e.g. Flowtree, Quadtree) in approximating optimal transport distances. Finally, experiments on synthetic data generated on ground truth trees show that our algorithm can accurately uncover the underlying trees.
LGFeb 16, 2023
Distances for Markov Chains, and Their DifferentiationTristan Brugère, Zhengchao Wan, Yusu Wang
(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed in (O'Connor et al., 2022) and the WL distance in (Chen et al., 2022). Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, our new discounted WL distance can be differentiated after a entropy-regularization similar to the Sinkhorn distance, making it suitable to use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.
LGAug 1, 2023
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functionsSamantha Chen, Yusu Wang
Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
LGJun 6, 2022
On the Convergence of Optimizing Persistent-Homology-Based LossesYikai Zhang, Jiachen Yao, Yusu Wang et al.
Topological loss based on persistent homology has shown promise in various applications. A topological loss enforces the model to achieve certain desired topological property. Despite its empirical success, less is known about the optimization behavior of the loss. In fact, the topological loss involves combinatorial configurations that may oscillate during optimization. In this paper, we introduce a general purpose regularized topology-aware loss. We propose a novel regularization term and also modify existing topological loss. These contributions lead to a new loss function that not only enforces the model to have desired topological behavior, but also achieves satisfying convergence behavior. Our main theoretical result guarantees that the loss can be optimized efficiently, under mild assumptions.
LGOct 20, 2023
Universal Representation of Permutation-Invariant Functions on Vectors and TensorsPuoya Tabaghi, Yusu Wang
A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes. Deep Sets, proposed by \cite{zaheer2017deep}, provides a \emph{universal representation} for continuous multiset functions on scalars via a sum-decomposable model. Restricting the domain of the functions to finite multisets of $D$-dimensional vectors, Deep Sets also provides a \emph{universal approximation} that requires a latent space dimension of $O(N^D)$ -- where $N$ is an upper bound on the size of input multisets. In this paper, we strengthen this result by proving that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(N^D)$. We then introduce \emph{identifiable} multisets for which we can uniquely label their elements using an identifier function, namely, finite-precision vectors are identifiable. Using our analysis on identifiable multisets, we prove that a sum-decomposable model for general continuous multiset functions only requires a latent dimension of $2DN$. We further show that both encoder and decoder functions of the model are continuous -- our main contribution to the existing work which lack such a guarantee. Also this provides a significant improvement over the aforementioned $O(N^D)$ bound which was derived for universal representation of continuous and discontinuous multiset functions. We then extend our results and provide special sum-decomposition structures to universally represent permutation-invariant tensor functions on identifiable tensors. These families of sum-decomposition models enables us to design deep network architectures and deploy them on a variety of learning tasks on sequences, images, and graphs.
LGNov 24, 2023
Cycle Invariant Positional Encoding for Graph Representation LearningZuoyu Yan, Tengfei Ma, Liangcai Gao et al.
Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.
LGMar 30, 2024Code
DE-HNN: An effective neural model for Circuit Netlist representationZhishang Luo, Truong Son Hy, Puoya Tabaghi et al.
The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottleneck. Designers want fast tools that can quickly give feedback on a design. Using the input and output data of the tools from past designs, one can attempt to build a machine learning model that predicts the outcome of a design in significantly shorter time than running the tool. The accuracy of such models is affected by the representation of the design data, which is usually a netlist that describes the elements of the digital circuit and how they are connected. Graph representations for the netlist together with graph neural networks have been investigated for such models. However, the characteristics of netlists pose several challenges for existing graph learning frameworks, due to the large number of nodes and the importance of long-range interactions between nodes. To address these challenges, we represent the netlist as a directed hypergraph and propose a Directional Equivariant Hypergraph Neural Network (DE-HNN) for the effective learning of (directed) hypergraphs. Theoretically, we show that our DE-HNN can universally approximate any node or hyperedge based function that satisfies certain permutation equivariant and invariant properties natural for directed hypergraphs. We compare the proposed DE-HNN with several State-of-the-art (SOTA) machine learning models for (hyper)graphs and netlists, and show that the DE-HNN significantly outperforms them in predicting the outcome of optimized place-and-route tools directly from the input netlists. Our source code and the netlists data used are publicly available at https://github.com/YusuLab/chips.git
LGFeb 13
Which Algorithms Can Graph Neural Networks Learn?Solveig Wittig, Antonis Vasileiou, Robert R. Nerem et al.
In recent years, there has been growing interest in understanding neural architectures' ability to learn to execute discrete algorithms, a line of work often referred to as neural algorithmic reasoning. The goal is to integrate algorithmic reasoning capabilities into larger neural pipelines. Many such architectures are based on (message-passing) graph neural networks (MPNNs), owing to their permutation equivariance and ability to deal with sparsity and variable-sized inputs. However, existing work is either largely empirical and lacks formal guarantees or it focuses solely on expressivity, leaving open the question of when and how such architectures generalize beyond a finite training set. In this work, we propose a general theoretical framework that characterizes the sufficient conditions under which MPNNs can learn an algorithm from a training set of small instances and provably approximate its behavior on inputs of arbitrary size. Our framework applies to a broad class of algorithms, including single-source shortest paths, minimum spanning trees, and general dynamic programming problems, such as the $0$-$1$ knapsack problem. In addition, we establish impossibility results for a wide range of algorithmic tasks, showing that standard MPNNs cannot learn them, and we derive more expressive MPNN-like architectures that overcome these limitations. Finally, we refine our analysis for the Bellman-Ford algorithm, yielding a substantially smaller required training set and significantly extending the recent work of Nerem et al. [2025] by allowing for a differentiable regularization loss. Empirical results largely support our theoretical findings.
LGFeb 14, 2024
Position: Topological Deep Learning is the New Frontier for Relational LearningTheodore Papamarkou, Tolga Birdal, Michael Bronstein et al.
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.
LGFeb 22, 2024
Comparing Graph Transformers via Positional EncodingsMitchell Black, Zhengchao Wan, Gal Mishne et al.
The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each pair of nodes, e.g., graph distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. Based on our theoretical results, we provide a study on several APEs and RPEs (including the resistance distance and the recently introduced stable and expressive positional encoding (SPE)) and compare their distinguishing power in terms of transformers. We believe our work will help navigate the huge number of choices of positional encoding and will provide guidance on the future design of positional encodings for graph transformers.
AIDec 17, 2023
NN-Steiner: A Mixed Neural-algorithmic Approach for the Rectilinear Steiner Minimum Tree ProblemAndrew B. Kahng, Robert R. Nerem, Yusu Wang et al.
Recent years have witnessed rapid advances in the use of neural networks to solve combinatorial optimization problems. Nevertheless, designing the "right" neural model that can effectively handle a given optimization problem can be challenging, and often there is no theoretical understanding or justification of the resulting neural model. In this paper, we focus on the rectilinear Steiner minimum tree (RSMT) problem, which is of critical importance in IC layout design and as a result has attracted numerous heuristic approaches in the VLSI literature. Our contributions are two-fold. On the methodology front, we propose NN-Steiner, which is a novel mixed neural-algorithmic framework for computing RSMTs that leverages the celebrated PTAS algorithmic framework of Arora to solve this problem (and other geometric optimization problems). Our NN-Steiner replaces key algorithmic components within Arora's PTAS by suitable neural components. In particular, NN-Steiner only needs four neural network (NN) components that are called repeatedly within an algorithmic framework. Crucially, each of the four NN components is only of bounded size independent of input size, and thus easy to train. Furthermore, as the NN component is learning a generic algorithmic step, once learned, the resulting mixed neural-algorithmic framework generalizes to much larger instances not seen in training. Our NN-Steiner, to our best knowledge, is the first neural architecture of bounded size that has capacity to approximately solve RSMT (and variants). On the empirical front, we show how NN-Steiner can be implemented and demonstrate the effectiveness of our resulting approach, especially in terms of generalization, by comparing with state-of-the-art methods (both neural and non-neural based).
LGApr 4, 2024
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph TransformersCai Zhou, Rose Yu, Yusu Wang · mit
Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.
LGMar 24, 2025
Graph neural networks extrapolate out-of-distribution for shortest pathsRobert R. Nerem, Samantha Chen, Sanjoy Dasgupta et al.
Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $ε$, it implements the BF algorithm with an error of $O(ε)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.
LGJan 15, 2025
Enhancing Graph Representation Learning with Localized Topological FeaturesZuoyu Yan, Qi Zhao, Ze Ye et al.
Representation learning on graphs is a fundamental problem that can be crucial in various tasks. Graph neural networks, the dominant approach for graph representation learning, are limited in their representation power. Therefore, it can be beneficial to explicitly extract and incorporate high-order topological and geometric information into these models. In this paper, we propose a principled approach to extract the rich connectivity information of graphs based on the theory of persistent homology. Our method utilizes the topological features to enhance the representation learning of graph neural networks and achieve state-of-the-art performance on various node classification and link prediction benchmarks. We also explore the option of end-to-end learning of the topological features, i.e., treating topological computation as a differentiable operator during learning. Our theoretical analysis and empirical study provide insights and potential guidelines for employing topological features in graph learning tasks.
LGFeb 11
General and Efficient Steering of Unconditional DiffusionQingsong Wang, Mikhail Belkin, Yusu Wang
Guiding unconditional diffusion models typically requires either retraining with conditional inputs or per-step gradient computations (e.g., classifier-based guidance), both of which incur substantial computational overhead. We present a general recipe for efficiently steering unconditional diffusion {without gradient guidance during inference}, enabling fast controllable generation. Our approach is built on two observations about diffusion model structure: Noise Alignment: even in early, highly corrupted stages, coarse semantic steering is possible using a lightweight, offline-computed guidance signal, avoiding any per-step or per-sample gradients. Transferable concept vectors: a concept direction in activation space once learned transfers across both {timesteps} and {samples}; the same fixed steering vector learned near low noise level remains effective when injected at intermediate noise levels for every generation trajectory, providing refined conditional control with efficiency. Such concept directions can be efficiently and reliably identified via Recursive Feature Machine (RFM), a light-weight backpropagation-free feature learning method. Experiments on CIFAR-10, ImageNet, and CelebA demonstrate improved accuracy/quality over gradient-based guidance, while achieving significant inference speedups.
AIMar 5
AI+HW 2035: Shaping the Next DecadeDeming Chen, Jason Cong, Azalia Mirhoseini et al.
Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined. The global research community lacks a cohesive, long-term vision to strategically coordinate the development of AI and HW. This fragmentation constrains progress toward holistic, sustainable, and adaptive AI systems capable of learning, reasoning, and operating efficiently across cloud, edge, and physical environments. The future of AI depends not only on scaling intelligence, but on scaling efficiency, achieving exponential gains in intelligence per joule, rather than unbounded compute consumption. Addressing this grand challenge requires rethinking the entire computing stack. This vision paper lays out a 10-year roadmap for AI+HW co-design and co-development, spanning algorithms, architectures, systems, and sustainability. We articulate key insights that redefine scaling around energy efficiency, system-level integration, and cross-layer optimization. We identify key challenges and opportunities, candidly assess potential obstacles and pitfalls, and propose integrated solutions grounded in algorithmic innovation, hardware advances, and software abstraction. Looking ahead, we define what success means in 10 years: achieving a 1000x improvement in efficiency for AI training and inference; enabling energy-aware, self-optimizing systems that seamlessly span cloud, edge, and physical AI; democratizing access to advanced AI infrastructure; and embedding human-centric principles into the design of intelligent systems. Finally, we outline concrete action items for academia, industry, government, and the broader community, calling for coordinated national initiatives, shared infrastructure, workforce development, cross-agency collaboration, and sustained public-private partnerships to ensure that AI+HW co-design becomes a unifying long-term mission.
LGAug 1, 2025
Explaining GNN Explanations with Edge GradientsJesse He, Akbar Rafiey, Gal Mishne et al.
In recent years, the remarkable success of graph neural networks (GNNs) on graph-structured data has prompted a surge of methods for explaining GNN predictions. However, the state-of-the-art for GNN explainability remains in flux. Different comparisons find mixed results for different methods, with many explainers struggling on more complex GNN architectures and tasks. This presents an urgent need for a more careful theoretical analysis of competing GNN explanation methods. In this work we take a closer look at GNN explanations in two different settings: input-level explanations, which produce explanatory subgraphs of the input graph, and layerwise explanations, which produce explanatory subgraphs of the computation graph. We establish the first theoretical connections between the popular perturbation-based and classical gradient-based methods, as well as point out connections between other recently proposed methods. At the input level, we demonstrate conditions under which GNNExplainer can be approximated by a simple heuristic based on the sign of the edge gradients. In the layerwise setting, we point out that edge gradients are equivalent to occlusion search for linear GNNs. Finally, we demonstrate how our theoretical results manifest in practice with experiments on both synthetic and real datasets.
NCMay 12, 2025
Skeletonization of neuronal processes using Discrete Morse techniques from computational topologySamik Banerjee, Caleb Stam, Daniel J. Tward et al.
To understand biological intelligence we need to map neuronal networks in vertebrate brains. Mapping mesoscale neural circuitry is done using injections of tracers that label groups of neurons whose axons project to different brain regions. Since many neurons are labeled, it is difficult to follow individual axons. Previous approaches have instead quantified the regional projections using the total label intensity within a region. However, such a quantification is not biologically meaningful. We propose a new approach better connected to the underlying neurons by skeletonizing labeled axon fragments and then estimating a volumetric length density. Our approach uses a combination of deep nets and the Discrete Morse (DM) technique from computational topology. This technique takes into account nonlocal connectivity information and therefore provides noise-robustness. We demonstrate the utility and scalability of the approach on whole-brain tracer injected data. We also define and illustrate an information theoretic measure that quantifies the additional information obtained, compared to the skeletonized tracer injection fragments, when individual axon morphologies are available. Our approach is the first application of the DM technique to computational neuroanatomy. It can help bridge between single-axon skeletons and tracer injections, two important data types in mapping neural networks in vertebrates.
LGDec 25, 2024
Elucidating Flow Matching ODE Dynamics with Respect to Data Geometries and DenoisersZhengchao Wan, Qingsong Wang, Gal Mishne et al.
Flow matching (FM) models extend ODE sampler based diffusion models into a general framework, significantly reducing sampling steps through learned vector fields. However, the theoretical understanding of FM models, particularly how their sample trajectories interact with underlying data geometry, remains underexplored. A rigorous theoretical analysis of FM ODE is essential for sample quality, stability, and broader applicability. In this paper, we advance the theory of FM models through a comprehensive analysis of sample trajectories. Central to our theory is the discovery that the denoiser, a key component of FM models, guides ODE dynamics through attracting and absorbing behaviors that adapt to the data geometry. We identify and analyze the three stages of ODE evolution: in the initial and intermediate stages, trajectories move toward the mean and local clusters of the data. At the terminal stage, we rigorously establish the convergence of FM ODE under weak assumptions, addressing scenarios where the data lie on a low-dimensional submanifold-cases that previous results could not handle. Our terminal stage analysis offers insights into the memorization phenomenon and establishes equivariance properties of FM ODEs. These findings bridge critical gaps in understanding flow matching models, with practical implications for optimizing sampling strategies and architectures guided by the intrinsic geometry of data.
LGFeb 5, 2022
Weisfeiler-Lehman meets Gromov-WassersteinSamantha Chen, Sunhyuk Lim, Facundo Mémoli et al.
The Weisfeiler-Lehman (WL) test is a classical procedure for graph isomorphism testing. The WL test has also been widely used both for designing graph kernels and for analyzing graph neural networks. In this paper, we propose the Weisfeiler-Lehman (WL) distance, a notion of distance between labeled measure Markov chains (LMMCs), of which labeled graphs are special cases. The WL distance is polynomial time computable and is also compatible with the WL test in the sense that the former is positive if and only if the WL test can distinguish the two involved graphs. The WL distance captures and compares subtle structures of the underlying LMMCs and, as a consequence of this, it is more discriminating than the distance between graphs used for defining the state-of-the-art Wasserstein Weisfeiler-Lehman graph kernel. Inspired by the structure of the WL distance we identify a neural network architecture on LMMCs which turns out to be universal w.r.t. continuous functions defined on the space of all LMMCs (which includes all graphs) endowed with the WL distance. Finally, the WL distance turns out to be stable w.r.t. a natural variant of the Gromov-Wasserstein (GW) distance for comparing metric Markov chains that we identify. Hence, the WL distance can also be construed as a polynomial time lower bound for the GW distance which is in general NP-hard to compute.
LGJan 28, 2022
Generative Coarse-Graining of Molecular ConformationsWujie Wang, Minkai Xu, Chen Cai et al.
Coarse-graining (CG) of molecular simulations simplifies the particle representation by grouping selected atoms into pseudo-beads and drastically accelerates simulation. However, such CG procedure induces information losses, which makes accurate backmapping, i.e., restoring fine-grained (FG) coordinates from CG coordinates, a long-standing challenge. Inspired by the recent progress in generative models and equivariant networks, we propose a novel model that rigorously embeds the vital probabilistic nature and geometric consistency requirements of the backmapping transformation. Our model encodes the FG uncertainties into an invariant latent space and decodes them back to FG geometries via equivariant convolutions. To standardize the evaluation of this domain, we provide three comprehensive benchmarks based on molecular dynamics trajectories. Experiments show that our approach always recovers more realistic structures and outperforms existing data-driven methods with a significant margin.
LGJan 28, 2022
Neural Approximation of Graph Topological FeaturesZuoyu Yan, Tengfei Ma, Liangcai Gao et al.
Topological features based on persistent homology capture high-order structural information so as to augment graph neural network methods. However, computing extended persistent homology summaries remains slow for large and dense graphs and can be a serious bottleneck for the learning pipeline. Inspired by recent success in neural algorithmic reasoning, we propose a novel graph neural network to estimate extended persistence diagrams (EPDs) on graphs efficiently. Our model is built on algorithmic insights, and benefits from better supervision and closer alignment with the EPD computation algorithm. We validate our method with convincing empirical results on approximating EPDs and downstream graph representation learning tasks. Our method is also efficient; on large and dense graphs, we accelerate the computation by nearly 100 times.
LGJan 25, 2022
Convergence of Invariant Graph NetworksChen Cai, Yusu Wang
Although theoretical properties such as expressive power and over-smoothing of graph neural networks (GNN) have been extensively studied recently, its convergence property is a relatively new direction. In this paper, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. We first prove the stability of linear layers for general $k$-IGN (of order $k$) based on a novel interpretation of linear equivariant layers. Building upon this result, we prove the convergence of $k$-IGN under the model of \citet{ruiz2020graphon}, where we access the edge weight but the convergence error is measured for graphon inputs. Under the more natural (and more challenging) setting of \citet{keriven2020convergence} where one can only access 0-1 adjacency matrix sampled according to edge probability, we first show a negative result that the convergence of any IGN is not possible. We then obtain the convergence of a subset of IGNs, denoted as IGN-small, after the edge probability estimation. We show that IGN-small still contains function class rich enough that can approximate spectral GNNs arbitrarily well. Lastly, we perform experiments on various graphon models to verify our statements.
GEO-PHApr 12, 2021
Equivariant geometric learning for digital rock physics: estimating formation factor and effective permeability tensors from Morse graphChen Cai, Nikolaos Vlassis, Lucas Magee et al.
We present a SE(3)-equivariant graph neural network (GNN) approach that directly predicting the formation factor and effective permeability from micro-CT images. FFT solvers are established to compute both the formation factor and effective permeability, while the topology and geometry of the pore space are represented by a persistence-based Morse graph. Together, they constitute the database for training, validating, and testing the neural networks. While the graph and Euclidean convolutional approaches both employ neural networks to generate low-dimensional latent space to represent the features of the micro-structures for forward predictions, the SE(3) equivariant neural network is found to generate more accurate predictions, especially when the training data is limited. Numerical experiments have also shown that the new SE(3) approach leads to predictions that fulfill the material frame indifference whereas the predictions from classical convolutional neural networks (CNN) may suffer from spurious dependence on the coordinate system of the training data. Comparisons among predictions inferred from training the CNN and those from graph convolutional neural networks (GNN) with and without the equivariant constraint indicate that the equivariant graph neural network seems to perform better than the CNN and GNN without enforcing equivariant constraints.
CVMar 18, 2021
Topology-Aware Segmentation Using Discrete Morse TheoryXiaoling Hu, Yusu Wang, Li Fuxin et al.
In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.
LGFeb 2, 2021
Graph Coarsening with Neural NetworksChen Cai, Dingkang Wang, Yusu Wang
As large-scale graphs become increasingly more prevalent, it poses significant computational challenges to process, extract and analyze large graph data. Graph coarsening is one popular technique to reduce the size of a graph while maintaining essential properties. Despite rich graph coarsening literature, there is only limited exploration of data-driven methods in the field. In this work, we leverage the recent progress of deep learning on graphs for graph coarsening. We first propose a framework for measuring the quality of coarsening algorithm and show that depending on the goal, we need to carefully choose the Laplace operator on the coarse graph and associated projection/lift operators. Motivated by the observation that the current choice of edge weight for the coarse graph may be sub-optimal, we parametrize the weight assignment map with graph neural networks and train it to improve the coarsening quality in an unsupervised way. Through extensive experiments on both synthetic and real networks, we demonstrate that our method significantly improves common graph coarsening methods under various metrics, reduction ratios, graph sizes, and graph types. It generalizes to graphs of larger size ($25\times$ of training graphs), is adaptive to different losses (differentiable and non-differentiable), and scales to much larger graphs than previous work.
LGJun 23, 2020
A Note on Over-Smoothing for Graph Neural NetworksChen Cai, Yusu Wang
Graph Neural Networks (GNNs) have achieved a lot of success on graph-structured data. However, it is observed that the performance of graph neural networks does not improve as the number of layers increases. This effect, known as over-smoothing, has been analyzed mostly in linear cases. In this paper, we build upon previous results \cite{oono2019graph} to further analyze the over-smoothing effect in the general graph neural network architecture. We show when the weight matrix satisfies the conditions determined by the spectrum of augmented normalized Laplacian, the Dirichlet energy of embeddings will converge to zero, resulting in the loss of discriminative power. Using Dirichlet energy to measure "expressiveness" of embedding is conceptually clean; it leads to simpler proofs than \cite{oono2019graph} and can handle more non-linearities.
CVMar 20, 2020
Detection and skeletonization of single neurons and tracer injections using topological methodsDingkang Wang, Lucas Magee, Bing-Xing Huo et al.
Neuroscientific data analysis has traditionally relied on linear algebra and stochastic process theory. However, the tree-like shapes of neurons cannot be described easily as points in a vector space (the subtraction of two neuronal shapes is not a meaningful operation), and methods from computational topology are better suited to their analysis. Here we introduce methods from Discrete Morse (DM) Theory to extract the tree-skeletons of individual neurons from volumetric brain image data, and to summarize collections of neurons labelled by tracer injections. Since individual neurons are topologically trees, it is sensible to summarize the collection of neurons using a consensus tree-shape that provides a richer information summary than the traditional regional 'connectivity matrix' approach. The conceptually elegant DM approach lacks hand-tuned parameters and captures global properties of the data as opposed to previous approaches which are inherently local. For individual skeletonization of sparsely labelled neurons we obtain substantial performance gains over state-of-the-art non-topological methods (over 10% improvements in precision and faster proofreading). The consensus-tree summary of tracer injections incorporates the regional connectivity matrix information, but in addition captures the collective collateral branching patterns of the set of neurons connected to the injection site, and provides a bridge between single-neuron morphology and tracer-injection data.
LGJan 16, 2020
Understanding the Power of Persistence Pairing via Permutation TestChen Cai, Yusu Wang
Recently many efforts have been made to incorporate persistence diagrams, one of the major tools in topological data analysis (TDA), into machine learning pipelines. To better understand the power and limitation of persistence diagrams, we carry out a range of experiments on both graph data and shape data, aiming to decouple and inspect the effects of different factors involved. To this end, we also propose the so-called \emph{permutation test} for persistence diagrams to delineate critical values and pairings of critical values. For graph classification tasks, we note that while persistence pairing yields consistent improvement over various benchmark datasets, it appears that for various filtration functions tested, most discriminative power comes from critical values. For shape segmentation and classification, however, we note that persistence pairing shows significant power on most of the benchmark datasets, and improves over both summaries based on merely critical values, and those based on permutation tests. Our results help provide insights on when persistence diagram based summaries could be more suitable.
CVSep 15, 2019
Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological MethodsTamal K. Dey, Jiayuan Wang, Yusu Wang
Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically. Of course, this requires a robust algorithm that can reconstruct the road networks from satellite images reliably so that the output can be fed as training samples. In this work, we develop such a technique by infusing a persistence-guided discrete Morse based graph reconstruction algorithm into ML framework. We elucidate our contributions in two phases. First, in a semi-automatic framework, we combine a discrete-Morse based graph reconstruction algorithm with an existing CNN framework to segment input satellite images. We show that this leads to reconstructions with better connectivity and less noise. Next, in a fully automatic framework, we leverage the power of the discrete-Morse based graph reconstruction algorithm to train a CNN from a collection of images without labelled data and use the same algorithm to produce the final output from the segmented images created by the trained CNN. We apply the discrete-Morse based graph reconstruction algorithm iteratively to improve the accuracy of the CNN. We show promising experimental results of this new framework on datasets from SpaceNet Challenge.
CGJul 31, 2019
A Structural Average of Labeled Merge Trees for Uncertainty VisualizationLin Yan, Yusu Wang, Elizabeth Munch et al.
Physical phenomena in science and engineering are frequently modeled using scalar fields. In scalar field topology, graph-based topological descriptors such as merge trees, contour trees, and Reeb graphs are commonly used to characterize topological changes in the (sub)level sets of scalar fields. One of the biggest challenges and opportunities to advance topology-based visualization is to understand and incorporate uncertainty into such topological descriptors to effectively reason about their underlying data. In this paper, we study a structural average of a set of labeled merge trees and use it to encode uncertainty in data. Specifically, we compute a 1-center tree that minimizes its maximum distance to any other tree in the set under a well-defined metric called the interleaving distance. We provide heuristic strategies that compute structural averages of merge trees whose labels do not fully agree. We further provide an interactive visualization system that resembles a numerical calculator that takes as input a set of merge trees and outputs a tree as their structural average. We also highlight structural similarities between the input and the average and incorporate uncertainty information for visual exploration. We develop a novel measure of uncertainty, referred to as consistency, via a metric-space view of the input trees. Finally, we demonstrate an application of our framework through merge trees that arise from ensembles of scalar fields. Our work is the first to employ interleaving distances and consistency to study a global, mathematically rigorous, structural average of merge trees in the context of uncertainty visualization.
LGNov 8, 2018
A simple yet effective baseline for non-attributed graph classificationChen Cai, Yusu Wang
Graphs are complex objects that do not lend themselves easily to typical learning tasks. Recently, a range of approaches based on graph kernels or graph neural networks have been developed for graph classification and for representation learning on graphs in general. As the developed methodologies become more sophisticated, it is important to understand which components of the increasingly complex methods are necessary or most effective. As a first step, we develop a simple yet meaningful graph representation, and explore its effectiveness in graph classification. We test our baseline representation for the graph classification task on a range of graph datasets. Interestingly, this simple representation achieves similar performance as the state-of-the-art graph kernels and graph neural networks for non-attributed graph classification. Its performance on classifying attributed graphs is slightly weaker as it does not incorporate attributes. However, given its simplicity and efficiency, we believe that it still serves as an effective baseline for attributed graph classification. Our graph representation is efficient (linear-time) to compute. We also provide a simple connection with the graph neural networks. Note that these observations are only for the task of graph classification while existing methods are often designed for a broader scope including node embedding and link prediction. The results are also likely biased due to the limited amount of benchmark datasets available. Nevertheless, the good performance of our simple baseline calls for the development of new, more comprehensive benchmark datasets so as to better evaluate and analyze different graph learning methods. Furthermore, given the computational efficiency of our graph summary, we believe that it is a good candidate as a baseline method for future graph classification (or even other graph learning) studies.
LGJun 27, 2018
A Topological Regularizer for Classifiers via Persistent HomologyChao Chen, Xiuyan Ni, Qinxun Bai et al.
Regularization plays a crucial role in supervised learning. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological penalty in training classifiers. We also pro- pose an efficient algorithm to compute the gradient of such penalty. Our method pro- vides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and real-world datasets.
MLJun 20, 2017
Unperturbed: spectral analysis beyond Davis-KahanJustin Eldridge, Mikhail Belkin, Yusu Wang
Classical matrix perturbation results, such as Weyl's theorem for eigenvalues and the Davis-Kahan theorem for eigenvectors, are general purpose. These classical bounds are tight in the worst case, but in many settings sub-optimal in the typical case. In this paper, we present perturbation bounds which consider the nature of the perturbation and its interaction with the unperturbed structure in order to obtain significant improvements over the classical theory in many scenarios, such as when the perturbation is random. We demonstrate the utility of these new results by analyzing perturbations in the stochastic blockmodel where we derive much tighter bounds than provided by the classical theory. We use our new perturbation theory to show that a very simple and natural clustering algorithm -- whose analysis was difficult using the classical tools -- nevertheless recovers the communities of the blockmodel exactly even in very sparse graphs.
MLJul 6, 2016
Graphons, mergeons, and so on!Justin Eldridge, Mikhail Belkin, Yusu Wang
In this work we develop a theory of hierarchical clustering for graphs. Our modeling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks. Graphons are a far richer class of graph models than stochastic blockmodels, the primary setting for recent progress in the statistical theory of graph clustering. We define what it means for an algorithm to produce the "correct" clustering, give sufficient conditions in which a method is statistically consistent, and provide an explicit algorithm satisfying these properties.
MLJun 21, 2015
Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical ClusteringJustin Eldridge, Mikhail Belkin, Yusu Wang
Hierarchical clustering is a popular method for analyzing data which associates a tree to a dataset. Hartigan consistency has been used extensively as a framework to analyze such clustering algorithms from a statistical point of view. Still, as we show in the paper, a tree which is Hartigan consistent with a given density can look very different than the correct limit tree. Specifically, Hartigan consistency permits two types of undesirable configurations which we term over-segmentation and improper nesting. Moreover, Hartigan consistency is a limit property and does not directly quantify difference between trees. In this paper we identify two limit properties, separation and minimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency. We proceed to introduce a merge distortion metric between hierarchical clusterings and show that convergence in our distance implies both separation and minimality. We also prove that uniform separation and minimality imply convergence in the merge distortion metric. Furthermore, we show that our merge distortion metric is stable under perturbations of the density. Finally, we demonstrate applicability of these concepts by proving convergence results for two clustering algorithms. First, we show convergence (and hence separation and minimality) of the recent robust single linkage algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence results on manifolds for topological split tree clustering.