LGJan 25, 2023
Graph Neural Tangent Kernel: Convergence on Large GraphsSanjukta Krishnagopal, Luana Ruiz
Graph neural networks (GNNs) achieve remarkable performance in graph machine learning tasks but can be hard to train on large-graph data, where their learning dynamics are not well understood. We investigate the training dynamics of large-graph GNNs using graph neural tangent kernels (GNTKs) and graphons. In the limit of large width, optimization of an overparametrized NN is equivalent to kernel regression on the NTK. Here, we investigate how the GNTK evolves as another independent dimension is varied: the graph size. We use graphons to define limit objects -- graphon NNs for GNNs, and graphon NTKs for GNTKs -- , and prove that, on a sequence of graphs, the GNTKs converge to the graphon NTK. We further prove that the spectrum of the GNTK, which is related to the directions of fastest learning which becomes relevant during early stopping, converges to the spectrum of the graphon NTK. This implies that in the large-graph limit, the GNTK fitted on a graph of moderate size can be used to solve the same task on the large graph, and to infer the learning dynamics of the large-graph GNN. These results are verified empirically on node regression and classification tasks.
LGMay 1
Topological Neural Tangent KernelSanjukta Krishnagopal
Graph neural tangent kernels give a principled infinite-width theory for graph neural networks, but inherit a basic limitation of graph models: they see only pairwise structure. Many relational systems contain higher-order interactions that are more naturally represented by simplicial complexes. We introduce the Topological Neural Tangent Kernel (TopoNTK), an infinite-width kernel for simplicial message passing on edge features. TopoNTK combines lower Hodge interactions, capturing graph-like coupling through shared vertices, with upper Hodge interactions, capturing coupling through filled simplices. This makes the kernel sensitive to topology invisible to graph kernels, allowing complexes with the same graph but different filled simplices to induce different kernels. Beyond expressivity, the Hodge structure gives the kernel an interpretable learning geometry. Edge signals decompose into gradient-like, harmonic, and local circulation components, and the spectrum of the TopoNTK determines how quickly each component is learned. This yields a topological form of spectral bias: components aligned with large-eigenvalue modes are learned quickly, while global harmonic modes, retained through the residual channel, often lie at smaller eigenvalues and are learned more slowly. We prove expressivity, Hodge-alignment, spectral learning, and stability properties, and validate them on synthetic simplicial tasks and DBLP higher-order link prediction. The results show that topology is not merely extra structure; it can provide coordinates that make relational learning more faithful, interpretable, and effective.
LGMay 1
Spectral Graph Sparsification Preserves Representation Geometry in Graph Neural NetworksSanjukta Krishnagopal
Spectral graph sparsification is a classical tool for reducing graph complexity while preserving Laplacian quadratic forms. In graph neural networks (GNNs), sparsification is often used to accelerate computation while maintaining predictive performance. In this work, we study a complementary representation-level question: does sparsification preserve the geometry of learned embeddings? For polynomial-filter GNNs, we prove that any $ε$-spectral sparsifier induces $O(ε)$ perturbations in polynomial graph filters, multilayer hidden representations, and their Gram matrices. These guarantees imply stability of squared pairwise distances, class means, and covariance structure in embedding space. We further establish finite-time training stability: under smoothness and boundedness assumptions, gradient descent on dense and sparsified graphs produces weight trajectories whose separation grows at most proportionally to the sparsification distortion. Empirically, effective-resistance sparsification validates the predicted perturbation chain on synthetic graphs and preserves hidden representation geometry on real datasets. In our experiments, the gram matrix and training dynamics show low divergence even under substantial sparsification, consistent with the predicted stability under spectral sparsification. Hidden Gram preservation strongly predicts neighborhood preservation and class-centroid stability across FashionMNIST, Cora, and Paul15. Together, these results show that spectral sparsification preserves not only graph operators, but also the representation geometry that supports downstream use of GNN embeddings for interpretability.
SOC-PHMay 19
Sparse Contextual Coupling Reshapes Diffusion Geometry in Multilayer HypergraphsHao Ding, Sanjukta Krishnagopal
Many complex systems combine dense background structure with sparse contextual information. We introduce a diffusion-based framework for analyzing how sparse condition-specific layers reshape diffusion geometry in multilayer hypergraphs. Each layer is represented as a weighted hypergraph, layers are coupled through shared entities, and random walks on the coupled system induce multiscale diffusion distances between nodes. We apply the framework to disease-conditioned gene networks by coupling a dense MSigDB functional gene-set layer to sparse disease-specific DGIdb drug-gene hypergraphs, with disease-associated drugs selected from DDDB and HumanNet-GSP used to define external gene weights. Across Bipolar Disorder, Schizophrenia, Leukemia, and Breast Cancer, the disease-specific layer contains less than 2 percent of genes in the coupled system, yet substantially changes diffusion distances and community structure. Centrality analysis suggests that this disproportionate effect arises because DGIdb-associated genes occupy influential positions in the MSigDB-derived functional network. The resulting diffusion-derived communities are stable under subsampling and show coherent post hoc functional enrichment, including signaling and neurotransmission categories in neuropsychiatric diseases and immune, translational, and metabolic categories in cancer-associated diseases. Community-level comparisons further reveal disease similarities not reducible to direct DGIdb gene overlap, including a Breast Cancer-Schizophrenia relationship consistent with recent biomedical evidence. These results show that sparse contextual layers can induce interpretable nonlocal changes in higher-order network geometry.
LGJun 1, 2025
Beyond Attention: Learning Spatio-Temporal Dynamics with Emergent Interpretable TopologiesSai Vamsi Alisetti, Vikas Kalagi, Sanjukta Krishnagopal
Spatio-temporal forecasting is critical in applications such as traffic prediction, energy demand modeling, and weather monitoring. While Graph Attention Networks (GATs) are popular for modeling spatial dependencies, they rely on predefined adjacency structures and dynamic attention scores, introducing inductive biases and computational overhead that can obscure interpretability. We propose InterGAT, a simplified alternative to GAT that replaces masked attention with a fully learnable, symmetric node interaction matrix, capturing latent spatial relationships without relying on fixed graph topologies. Our framework, InterGAT-GRU, which incorporates a GRU-based temporal decoder, outperforms the baseline GAT-GRU in forecasting accuracy, achieving at least a 21% improvement on the SZ-Taxi dataset and a 6% improvement on the Los-Loop dataset across all forecasting horizons (15 to 60 minutes). Additionally, we observed reduction in training time by 60-70% compared to GAT-GRU baseline. Crucially, the learned interaction matrix reveals interpretable structure: it recovers sparse, topology-aware attention patterns that align with community structure. Spectral and clustering analyses show that the model captures both localized and global dynamics, offering insights into the functional topology driving predictions. This highlights how structure learning can simultaneously support prediction, computational efficiency, and topological interpretabil-ity in dynamic graph-based domains.
NCDec 10, 2021
Encoding priors in the brain: a reinforcement learning model for mouse decision makingSanjukta Krishnagopal, Peter Latham
In two-alternative forced choice tasks, prior knowledge can improve performance, especially when operating near the psychophysical threshold. For instance, if subjects know that one choice is much more likely than the other, they can make that choice when evidence is weak. A common hypothesis for these kinds of tasks is that the prior is stored in neural activity. Here we propose a different hypothesis: the prior is stored in synaptic strengths. We study the International Brain Laboratory task, in which a grating appears on either the right or left side of a screen, and a mouse has to move a wheel to bring the grating to the center. The grating is often low in contrast which makes the task relatively difficult, and the prior probability that the grating appears on the right is either 80% or 20%, in (unsignaled) blocks of about 50 trials. We model this as a reinforcement learning task, using a feedforward neural network to map states to actions, and adjust the weights of the network to maximize reward, learning via policy gradient. Our model uses an internal state that stores an estimate of the grating and confidence, and follows Bayesian updates, and can switch between engaged and disengaged states to mimic animal behavior. This model reproduces the main experimental finding - that the psychometric curve with respect to contrast shifts after a block switch in about 10 trials. Also, as seen in the experiments, in our model the difference in neuronal activity in the right and left blocks is small - it is virtually impossible to decode block structure from activity on single trials if noise is about 2%. The hypothesis that priors are stored in weights is difficult to test, but the technology to do so should be available in the not so distant future.
QMSep 29, 2021
Stroke recovery phenotyping through network trajectory approaches and graph neural networksSanjukta Krishnagopal, Keith Lohse, Robynne Braun
Stroke is a leading cause of neurological injury characterized by impairments in multiple neurological domains including cognition, language, sensory and motor functions. Clinical recovery in these domains is tracked using a wide range of measures that may be continuous, ordinal, interval or categorical in nature, which presents challenges for standard multivariate regression approaches. This has hindered stroke researchers' ability to achieve an integrated picture of the complex time-evolving interactions amongst symptoms. Here we use tools from network science and machine learning that are particularly well-suited to extracting underlying patterns in such data, and may assist in prediction of recovery patterns. To demonstrate the utility of this approach, we analyzed data from the NINDS tPA trial using the Trajectory Profile Clustering (TPC) method to identify distinct stroke recovery patterns for 11 different neurological domains at 5 discrete time points. Our analysis identified 3 distinct stroke trajectory profiles that align with clinically relevant stroke syndromes, characterized both by distinct clusters of symptoms, as well as differing degrees of symptom severity. We then validated our approach using graph neural networks to determine how well our model performed predictively for stratifying patients into these trajectory profiles at early vs. later time points post-stroke. We demonstrate that trajectory profile clustering is an effective method for identifying clinically relevant recovery subtypes in multidimensional longitudinal datasets, and for early prediction of symptom progression subtypes in individual patients. This paper is the first work introducing network trajectory approaches for stroke recovery phenotyping, and is aimed at enhancing the translation of such novel computational approaches for practical clinical application.
LGOct 2, 2020
Encoded Prior Sliced Wasserstein AutoEncoder for learning latent manifold representationsSanjukta Krishnagopal, Jacob Bedrossian
While variational autoencoders have been successful in several tasks, the use of conventional priors are limited in their ability to encode the underlying structure of input data. We introduce an Encoded Prior Sliced Wasserstein AutoEncoder wherein an additional prior-encoder network learns an embedding of the data manifold which preserves topological and geometric properties of the data, thus improving the structure of latent space. The autoencoder and prior-encoder networks are iteratively trained using the Sliced Wasserstein distance. The effectiveness of the learned manifold encoding is explored by traversing latent space through interpolations along geodesics which generate samples that lie on the data manifold and hence are more realistic compared to Euclidean interpolation. To this end, we introduce a graph-based algorithm for exploring the data manifold and interpolating along network-geodesics in latent space by maximizing the density of samples along the path while minimizing total energy. We use the 3D-spiral data to show that the prior encodes the geometry underlying the data unlike conventional autoencoders, and to demonstrate the exploration of the embedded data manifold through the network algorithm. We apply our framework to benchmarked image datasets to demonstrate the advantages of learning data representations in outlier generation, latent structure, and geodesic interpolation.
SPOct 18, 2019
Separation of Chaotic Signals by Reservoir ComputingSanjukta Krishnagopal, Michelle Girvan, Edward Ott et al.
We demonstrate the utility of machine learning in the separation of superimposed chaotic signals using a technique called Reservoir Computing. We assume no knowledge of the dynamical equations that produce the signals, and require only training data consisting of finite time samples of the component signals. We test our method on signals that are formed as linear combinations of signals from two Lorenz systems with different parameters. Comparing our nonlinear method with the optimal linear solution to the separation problem, the Wiener filter, we find that our method significantly outperforms the Wiener filter in all the scenarios we study. Furthermore, this difference is particularly striking when the component signals have similar frequency spectra. Indeed, our method works well when the component frequency spectra are indistinguishable - a case where a Wiener filter performs essentially no separation.