LGMar 22, 2024Code
Simple Graph CondensationZhenbang Xiao, Yu Wang, Shunyu Liu et al.
The burdensome training costs on large-scale graphs have aroused significant interest in graph condensation, which involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on the large-scale original graph. Existing methods primarily focus on aligning key metrics between the condensed and original graphs, such as gradients, output distribution and trajectories of GNNs, yielding satisfactory performance on downstream tasks. However, these complex metrics necessitate intricate external parameters and can potentially disrupt the optimization process of the condensation graph, making the condensation process highly demanding and unstable. Motivated by the recent success of simplified models across various domains, we propose a simplified approach to metric alignment in graph condensation, aiming to reduce unnecessary complexity inherited from intricate metrics. We introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer, guided by a pre-trained Simple Graph Convolution (SGC) model on the original graph. Importantly, SimGC eliminates external parameters and exclusively retains the target condensed graph during the condensation process. This straightforward yet effective strategy achieves a significant speedup of up to 10 times compared to existing graph condensation methods while performing on par with state-of-the-art baselines. Comprehensive experiments conducted on seven benchmark datasets demonstrate the effectiveness of SimGC in prediction accuracy, condensation time, and generalization capability. Our code is available at https://github.com/BangHonor/SimGC.
SIJan 18, 2024Code
Disentangled Condensation for Large-scale GraphsZhenbang Xiao, Yu Wang, Shunyu Liu et al.
Graph condensation has emerged as an intriguing technique to save the expensive training costs of Graph Neural Networks (GNNs) by substituting a condensed small graph with the original graph. Despite the promising results achieved, previous methods usually employ an entangled paradigm of redundant parameters (nodes, edges, GNNs), which incurs complex joint optimization during condensation. This paradigm has considerably impeded the scalability of graph condensation, making it challenging to condense extremely large-scale graphs and generate high-fidelity condensed graphs. Therefore, we propose to disentangle the condensation process into a two-stage GNN-free paradigm, independently condensing nodes and generating edges while eliminating the need to optimize GNNs at the same time. The node condensation module avoids the complexity of GNNs by focusing on node feature alignment with anchors of the original graph, while the edge translation module constructs the edges of the condensed nodes by transferring the original structure knowledge with neighborhood anchors. This simple yet effective approach achieves at least 10 times faster than state-of-the-art methods with comparable accuracy on medium-scale graphs. Moreover, the proposed DisCo can successfully scale up to the Ogbn-papers100M graph containing over 100 million nodes with flexible reduction rates and improves performance on the second-largest Ogbn-products dataset by over 5%. Extensive downstream tasks and ablation study on five common datasets further demonstrate the effectiveness of the proposed DisCo framework. Our code is available at https://github.com/BangHonor/DisCo.
LGMay 1, 2025
From GNNs to Trees: Multi-Granular Interpretability for Graph Neural NetworksJie Yang, Yuwen Wang, Kaixuan Chen et al.
Interpretable Graph Neural Networks (GNNs) aim to reveal the underlying reasoning behind model predictions, attributing their decisions to specific subgraphs that are informative. However, existing subgraph-based interpretable methods suffer from an overemphasis on local structure, potentially overlooking long-range dependencies within the entire graphs. Although recent efforts that rely on graph coarsening have proven beneficial for global interpretability, they inevitably reduce the graphs to a fixed granularity. Such an inflexible way can only capture graph connectivity at a specific level, whereas real-world graph tasks often exhibit relationships at varying granularities (e.g., relevant interactions in proteins span from functional groups, to amino acids, and up to protein domains). In this paper, we introduce a novel Tree-like Interpretable Framework (TIF) for graph classification, where plain GNNs are transformed into hierarchical trees, with each level featuring coarsened graphs of different granularity as tree nodes. Specifically, TIF iteratively adopts a graph coarsening module to compress original graphs (i.e., root nodes of trees) into increasingly coarser ones (i.e., child nodes of trees), while preserving diversity among tree nodes within different branches through a dedicated graph perturbation module. Finally, we propose an adaptive routing module to identify the most informative root-to-leaf paths, providing not only the final prediction but also the multi-granular interpretability for the decision-making process. Extensive experiments on the graph classification benchmarks with both synthetic and real-world datasets demonstrate the superiority of TIF in interpretability, while also delivering a competitive prediction performance akin to the state-of-the-art counterparts.
AIJun 27, 2025
Universal Retrieval for Multimodal Trajectory ModelingXuan Zhang, Ziyan Jiang, Rui Meng et al. · amazon-science
Trajectory data, capturing human actions and environmental states across various modalities, holds significant potential for enhancing AI agent capabilities, particularly in GUI environments. However, how to model the representation of trajectory-level data presents a significant challenge that has not been systematically addressed amid explosive trajectory data growth. In this work, we introduce Multimodal Trajectory Retrieval, bridging the gap between universal retrieval and agent-centric trajectory modeling. We construct the Unified Agent Trajectory Dataset (UATD) from annotated demonstrations and states across diverse real-world scenarios. Based on this, we present GAE-Bench, a benchmark containing a large number of trajectory-based retrieval pairs. In addition, we propose GAE-Retriever, a multimodal retrieval framework that adopts vision-language models and incorporates optimized contrastive learning through a token selection and the GradCache mechanism. Comprehensive evaluations across multiple datasets show that GAE-Retriever consistently outperforms strong baselines in retrieval recall, highlighting its effectiveness in advancing multimodal trajectory retrieval.