CVApr 19, 2022
SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic SegmentationBinhui Xie, Shuang Li, Mingjia Li et al. · tsinghua
Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the supervised model trained on a labeled source domain. In this work, we propose Semantic-Guided Pixel Contrast (SePiCo), a novel one-stage adaptation framework that highlights the semantic concepts of individual pixels to promote learning of class-discriminative and class-balanced pixel representations across domains, eventually boosting the performance of self-training methods. Specifically, to explore proper semantic concepts, we first investigate a centroid-aware pixel contrast that employs the category centroids of the entire source domain or a single source image to guide the learning of discriminative features. Considering the possible lack of category diversity in semantic concepts, we then blaze a trail of distributional perspective to involve a sufficient quantity of instances, namely distribution-aware pixel contrast, in which we approximate the true distribution of each semantic category from the statistics of labeled source data. Moreover, such an optimization objective can derive a closed-form upper bound by implicitly involving an infinite number of (dis)similar pairs, making it computationally efficient. Extensive experiments show that SePiCo not only helps stabilize training but also yields discriminative representations, making significant progress on both synthetic-to-real and daytime-to-nighttime adaptation scenarios.
CVJun 28, 2022Code
Multi-Prior Learning via Neural Architecture Search for Blind Face RestorationYanjiang Yu, Puyang Zhang, Kaihao Zhang et al.
Blind Face Restoration (BFR) aims to recover high-quality face images from low-quality ones and usually resorts to facial priors for improving restoration performance. However, current methods still suffer from two major difficulties: 1) how to derive a powerful network architecture without extensive hand tuning; 2) how to capture complementary information from multiple facial priors in one network to improve restoration performance. To this end, we propose a Face Restoration Searching Network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space, which can directly contribute to the restoration quality. On the basis of FRSNet, we further design our Multiple Facial Prior Searching Network (MFPSNet) with a multi-prior learning scheme. MFPSNet optimally extracts information from diverse facial priors and fuses the information into image features, ensuring that both external guidance and internal features are reserved. In this way, MFPSNet takes full advantage of semantic-level (parsing maps), geometric-level (facial heatmaps), reference-level (facial dictionaries) and pixel-level (degraded images) information and thus generates faithful and realistic images. Quantitative and qualitative experiments show that MFPSNet performs favorably on both synthetic and real-world datasets against the state-of-the-art BFR methods. The codes are publicly available at: https://github.com/YYJ1anG/MFPSNet.
LGJul 22, 2022
Robust Knowledge Adaptation for Dynamic Graph Neural NetworksHanjie Li, Changsheng Li, Kaituo Feng et al.
Graph structured data often possess dynamic characters in nature. Recent years have witnessed the increasing attentions paid to dynamic graph neural networks for modelling graph data. However, almost all existing approaches operate under the assumption that, upon the establishment of a new link, the embeddings of the neighboring nodes should undergo updates to learn temporal dynamics. Nevertheless, these approaches face the following limitation: If the node introduced by a new connection contains noisy information, propagating its knowledge to other nodes becomes unreliable and may even lead to the collapse of the model. In this paper, we propose Ada-DyGNN: a robust knowledge Adaptation framework via reinforcement learning for Dynamic Graph Neural Networks. In contrast to previous approaches, which update the embeddings of the neighbor nodes immediately after adding a new link, Ada-DyGNN adaptively determines which nodes should be updated. Considering that the decision to update the embedding of one neighbor node can significantly impact other neighbor nodes, we conceptualize the node update selection as a sequence decision problem and employ reinforcement learning to address it effectively. By this means, we can adaptively propagate knowledge to other nodes for learning robust node embedding representations. To the best of our knowledge, our approach constitutes the first attempt to explore robust knowledge adaptation via reinforcement learning specifically tailored for dynamic graph neural networks. Extensive experiments on three benchmark datasets demonstrate that Ada-DyGNN achieves the state-of-the-art performance. In addition, we conduct experiments by introducing different degrees of noise into the dataset, quantitatively and qualitatively illustrating the robustness of Ada-DyGNN.
LGApr 26, 2022
Self-Supervised Information Bottleneck for Deep Multi-View Subspace ClusteringShiye Wang, Changsheng Li, Yanming Li et al.
In this paper, we explore the problem of deep multi-view subspace clustering framework from an information-theoretic point of view. We extend the traditional information bottleneck principle to learn common information among different views in a self-supervised manner, and accordingly establish a new framework called Self-supervised Information Bottleneck based Multi-view Subspace Clustering (SIB-MSC). Inheriting the advantages from information bottleneck, SIB-MSC can learn a latent space for each view to capture common information among the latent representations of different views by removing superfluous information from the view itself while retaining sufficient information for the latent representations of other views. Actually, the latent representation of each view provides a kind of self-supervised signal for training the latent representations of other views. Moreover, SIB-MSC attempts to learn the other latent space for each view to capture the view-specific information by introducing mutual information based regularization terms, so as to further improve the performance of multi-view subspace clustering. To the best of our knowledge, this is the first work to explore information bottleneck for multi-view subspace clustering. Extensive experiments on real-world multi-view data demonstrate that our method achieves superior performance over the related state-of-the-art methods.
LGJun 14, 2022
FreeKD: Free-direction Knowledge Distillation for Graph Neural NetworksKaituo Feng, Changsheng Li, Ye Yuan et al.
Knowledge distillation (KD) has demonstrated its effectiveness to boost the performance of graph neural networks (GNNs), where its goal is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is actually difficult to train a satisfactory teacher GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via Reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. The core idea of our work is to collaboratively build two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often has better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that consists of two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. In essence, our FreeKD is a general and principled framework which can be naturally compatible with GNNs of different architectures. Extensive experiments on five benchmark datasets demonstrate our FreeKD outperforms two base GNNs in a large margin, and shows its efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
CVJun 8, 2022
Blind Face Restoration: Benchmark Datasets and a Baseline ModelPuyang Zhang, Kaihao Zhang, Wenhan Luo et al.
Blind Face Restoration (BFR) aims to construct a high-quality (HQ) face image from its corresponding low-quality (LQ) input. Recently, many BFR methods have been proposed and they have achieved remarkable success. However, these methods are trained or evaluated on privately synthesized datasets, which makes it infeasible for the subsequent approaches to fairly compare with them. To address this problem, we first synthesize two blind face restoration benchmark datasets called EDFace-Celeb-1M (BFR128) and EDFace-Celeb-150K (BFR512). State-of-the-art methods are benchmarked on them under five settings including blur, noise, low resolution, JPEG compression artifacts, and the combination of them (full degradation). To make the comparison more comprehensive, five widely-used quantitative metrics and two task-driven metrics including Average Face Landmark Distance (AFLD) and Average Face ID Cosine Similarity (AFICS) are applied. Furthermore, we develop an effective baseline model called Swin Transformer U-Net (STUNet). The STUNet with U-net architecture applies an attention mechanism and a shifted windowing scheme to capture long-range pixel interactions and focus more on significant features while still being trained efficiently. Experimental results show that the proposed baseline method performs favourably against the SOTA methods on various BFR tasks.
LGDec 4, 2022
FedKNOW: Federated Continual Learning with Signature Task Knowledge Integration at EdgeYaxin Luopan, Rui Han, Qinglong Zhang et al.
Deep Neural Networks (DNNs) have been ubiquitously adopted in internet of things and are becoming an integral of our daily life. When tackling the evolving learning tasks in real world, such as classifying different types of objects, DNNs face the challenge to continually retrain themselves according to the tasks on different edge devices. Federated continual learning is a promising technique that offers partial solutions but yet to overcome the following difficulties: the significant accuracy loss due to the limited on-device processing, the negative knowledge transfer caused by the limited communication of non-IID data, and the limited scalability on the tasks and edge devices. In this paper, we propose FedKNOW, an accurate and scalable federated continual learning framework, via a novel concept of signature task knowledge. FedKNOW is a client side solution that continuously extracts and integrates the knowledge of signature tasks which are highly influenced by the current task. Each client of FedKNOW is composed of a knowledge extractor, a gradient restorer and, most importantly, a gradient integrator. Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model. We implement FedKNOW in PyTorch and extensively evaluate it against state-of-the-art techniques using popular federated continual learning benchmarks. Extensive evaluation results on heterogeneous edge devices show that FedKNOW improves model accuracy by 63.24% without increasing model training time, reduces communication cost by 34.28%, and achieves more improvements under difficult scenarios such as large numbers of tasks or clients, and training different complex networks.
23.0IRMay 16
Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative FilteringJiayi Wu, Zhengyu Wu, Xunkai Li et al.
Negative sampling is a pivotal technique in implicit collaborative filtering (CF) recommendation, enabling efficient and effective training by contrasting observed interactions with sampled unobserved ones. Recently, large language models (LLMs) have shown promise in recommender systems; however, research on LLM-empowered negative sampling remains underexplored. Existing methods heavily rely on textual information and task-specific fine-tuning, limiting practical applicability. To this end, we propose a text-free and fine-tuning-free Dual-Tree LLM-enhanced Negative Sampling method (DTL-NS). It consists of two modules: (i) an offline false negative identification module that leverages hierarchical index trees to transform collaborative structural and latent semantic information into structured item-ID encodings for LLM inference, enabling accurate identification of false negatives; and (ii) a multi-view hard negative sampling module that combines user-item preference scores with item-item hierarchical similarities from these encodings to mine high-quality negatives, thus improving the discriminative ability of recommender models. Extensive experiments demonstrate the effectiveness of DTL-NS. Moreover, DTL-NS shows broad applicability across different implicit CF models, negative sampling methods, and LLMs, consistently enhancing recommendation performance.
AISep 25, 2024
Harnessing Diversity for Important Data Selection in Pretraining Large Language ModelsChi Zhang, Huaping Zhong, Kuan Zhang et al.
Data selection is of great significance in pre-training large language models, given the variation in quality within the large-scale available training corpora. To achieve this, researchers are currently investigating the use of data influence to measure the importance of data instances, $i.e.,$ a high influence score indicates that incorporating this instance to the training set is likely to enhance the model performance. Consequently, they select the top-$k$ instances with the highest scores. However, this approach has several limitations. (1) Computing the influence of all available data is time-consuming. (2) The selected data instances are not diverse enough, which may hinder the pre-trained model's ability to generalize effectively to various downstream tasks. In this paper, we introduce \texttt{Quad}, a data selection approach that considers both quality and diversity by using data influence to achieve state-of-the-art pre-training results. In particular, noting that attention layers capture extensive semantic details, we have adapted the accelerated $iHVP$ computation methods for attention layers, enhancing our ability to evaluate the influence of data, $i.e.,$ its quality. For the diversity, \texttt{Quad} clusters the dataset into similar data instances within each cluster and diverse instances across different clusters. For each cluster, if we opt to select data from it, we take some samples to evaluate the influence to prevent processing all instances. To determine which clusters to select, we utilize the classic Multi-Armed Bandit method, treating each cluster as an arm. This approach favors clusters with highly influential instances (ensuring high quality) or clusters that have been selected less frequently (ensuring diversity), thereby well balancing between quality and diversity.
79.5LGApr 14Code
RoleMAG: Learning Neighbor Roles in Multimodal GraphsYilong Zuo, Xunkai Li, Zhihan Zhang et al.
Multimodal attributed graphs (MAGs) combine multimodal node attributes with structured relations. However, existing methods usually perform shared message passing on a single graph and implicitly assume that the same neighbors are equally useful for all modalities. In practice, neighbors that benefit one modality may interfere with another, blurring modality-specific signals under shared propagation. To address this issue, we propose RoleMAG, a multimodal graph framework that learns how different neighbors should participate in propagation. Concretely, RoleMAG distinguishes whether a neighbor should provide shared, complementary, or heterophilous signals, and routes them through separate propagation channels. This enables cross-modal completion from complementary neighbors while keeping heterophilous ones out of shared smoothing. Extensive experiments on three graph-centric MAG benchmarks show that RoleMAG achieves the best results on RedditS and Bili\_Dance, while remaining competitive on Toys. Ablation, robustness, and efficiency analyses further support the effectiveness of the proposed role-aware propagation design. Our code is available at https://anonymous.4open.science/r/RoleMAG-7EE0/
LGFeb 5Code
OpenMAG: A Comprehensive Benchmark for Multimodal-Attributed GraphChenxi Wan, Xunkai Li, Yilong Zuo et al.
Multimodal-Attributed Graph (MAG) learning has achieved remarkable success in modeling complex real-world systems by integrating graph topology with rich attributes from multiple modalities. With the rapid proliferation of novel MAG models capable of handling intricate cross-modal semantics and structural dependencies, establishing a rigorous and unified evaluation standard has become imperative. Although existing benchmarks have facilitated initial progress, they exhibit critical limitations in domain coverage, encoder flexibility, model diversity, and task scope, presenting significant challenges to fair evaluation. To bridge this gap, we present OpenMAG, a comprehensive benchmark that integrates 19 datasets across 6 domains and incorporates 16 encoders to support both static and trainable feature encoding. OpenMAG further implements a standardized library of 24 state-of-the-art models and supports 8 downstream tasks, enabling fair comparisons within a unified framework. Through systematic assessment of necessity, data quality, effectiveness, robustness, and efficiency, we derive 14 fundamental insights into MAG learning to guide future advancements. Our code is available at https://github.com/YUKI-N810/OpenMAG.
LGJul 2, 2023
Shared Growth of Graph Neural Networks via Prompted Free-direction Knowledge DistillationKaituo Feng, Yikun Miao, Changsheng Li et al.
Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Additionally, considering that different augmented graphs can potentially capture distinct perspectives of the graph data, we propose FreeKD-Prompt that learns undistorted and diverse augmentations based on prompt learning for exchanging varied knowledge. Furthermore, instead of confining knowledge exchange within two GNNs, we develop FreeKD++ to enable free-direction knowledge transfer among multiple GNNs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
LGJul 20, 2023
DREAM: Domain-free Reverse Engineering Attributes of Black-box ModelRongqing Li, Jiaqi Yu, Changsheng Li et al.
Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.
CVOct 18, 2023
Learning to Generate Parameters of ConvNets for Unseen Image DataShiye Wang, Kaituo Feng, Changsheng Li et al.
Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exist correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65%. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92%), more than 2,300 times faster than the traditional training paradigm.
LGAug 21, 2024
Macformer: Transformer with Random Maclaurin Feature AttentionYuhan Guo, Lizhong Ding, Ye Yuan et al.
Random feature attention (RFA) adopts random fourier feature (RFF) methods to approximate the softmax function, resulting in a linear time and space attention mechanism that enables the construction of an efficient Transformer. Inspired by RFA, we propose Macformer, a Transformer architecture that employs random Maclaurin features (RMF) to approximate various dot-product kernels, thereby accelerating attention computations for long sequence. Macformer consists of Random Maclaurin Feature Attention (RMFA) and pre-post Scaling Batch Normalization (ppSBN), the former is an unbiased approximation for dot-product kernelized attention and the later is a two-stage regularization mechanism guaranteeing the error of RMFA. We conducted toy experiments to demonstrate the efficiency of RMFA and ppSBN, and experiments on long range arena (LRA) benchmark to validate the acceleration and accuracy of Macformer with different dot-product kernels. Experiment results of Macformer are consistent with our theoretical analysis.
LGAug 29, 2024
OpenFGL: A Comprehensive Benchmark for Federated Graph LearningXunkai Li, Yinlin Zhu, Boyang Pang et al.
Federated graph learning (FGL) is a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach inherently involves large-scale distributed graph processing, which closely aligns with the challenges and research focuses of graph-based data systems. Despite the proliferation of FGL, the diverse motivations from real-world applications, spanning various research backgrounds and settings, pose a significant challenge to fair evaluation. To fill this gap, we propose OpenFGL, a unified benchmark designed for the primary FGL scenarios: Graph-FL and Subgraph-FL. Specifically, OpenFGL includes 42 graph datasets from 18 application domains, 8 federated data simulation strategies that emphasize different graph properties, and 5 graph-based downstream tasks. Additionally, it offers 18 recently proposed SOTA FGL algorithms through a user-friendly API, enabling a thorough comparison and comprehensive evaluation of their effectiveness, robustness, and efficiency. Our empirical results demonstrate the capabilities of FGL while also highlighting its potential limitations, providing valuable insights for future research in this growing field, particularly in fostering greater interdisciplinary collaboration between FGL and data systems.
CROct 17, 2023
Privacy-Preserving Graph Embedding based on Local Differential PrivacyZening Li, Rong-Hua Li, Meihao Liao et al.
Graph embedding has become a powerful tool for learning latent representations of nodes in a graph. Despite its superior performance in various graph-based machine learning tasks, serious privacy concerns arise when the graph data contains personal or sensitive information. To address this issue, we investigate and develop graph embedding algorithms that satisfy local differential privacy (LDP). We introduce a novel privacy-preserving graph embedding framework, named PrivGE, to protect node data privacy. Specifically, we propose an LDP mechanism to obfuscate node data and utilize personalized PageRank as the proximity measure to learn node representations. Furthermore, we provide a theoretical analysis of the privacy guarantees and utility offered by the PrivGE framework. Extensive experiments on several real-world graph datasets demonstrate that PrivGE achieves an optimal balance between privacy and utility, and significantly outperforms existing methods in node classification and link prediction tasks.
LGFeb 2
DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics AnalysisRu Zhang, Xunkai Li, Yaxin Deng et al.
Recently, data-centric AI methodology has been a dominant paradigm in single-cell transcriptomics analysis, which treats data representation rather than model complexity as the fundamental bottleneck. In the review of current studies, earlier sequence methods treat cells as independent entities and adapt prevalent ML models to analyze their directly inherited sequence data. Despite their simplicity and intuition, these methods overlook the latent intercellular relationships driven by the functional mechanisms of biological systems and the inherent quality issues of the raw sequence data. Therefore, a series of structured methods has emerged. Although they employ various heuristic rules to capture intricate intercellular relationships and enhance the raw sequencing data, these methods often neglect biological prior knowledge. This omission incurs substantial overhead and yields suboptimal graph representations, thereby hindering the utility of ML models. To address them, we propose DOGMA, a holistic data-centric framework designed for the structural reshaping and semantic enhancement of raw data through multi-level biological prior knowledge. Transcending reliance on stochastic heuristics, DOGMA redefines graph construction by integrating Statistical Anchors with Cell Ontology and Phylogenetic Trees to enable deterministic structure discovery and robust cross-species alignment. Furthermore, Gene Ontology is utilized to bridge the feature-level semantic gap by incorporating functional priors. In complex multi-species and multi-organ benchmarks, DOGMA achieves SOTA performance, exhibiting superior zero-shot robustness and sample efficiency while operating with significantly lower computational cost.
LGJan 23
BoostFGL: Boosting Fairness in Federated Graph LearningZekai Chen, Kairui Yang, Xunkai Li et al.
Federated graph learning (FGL) enables collaborative training of graph neural networks (GNNs) across decentralized subgraphs without exposing raw data. While existing FGL methods often achieve high overall accuracy, we show that this average performance can conceal severe degradation on disadvantaged node groups. From a fairness perspective, these disparities arise systematically from three coupled sources: label skew toward majority patterns, topology confounding in message propagation, and aggregation dilution of updates from hard clients. To address this, we propose \textbf{BoostFGL}, a boosting-style framework for fairness-aware FGL. BoostFGL introduces three coordinated mechanisms: \ding{182} \emph{Client-side node boosting}, which reshapes local training signals to emphasize systematically under-served nodes; \ding{183} \emph{Client-side topology boosting}, which reallocates propagation emphasis toward reliable yet underused structures and attenuates misleading neighborhoods; and \ding{184} \emph{Server-side model boosting}, which performs difficulty- and reliability-aware aggregation to preserve informative updates from hard clients while stabilizing the global model. Extensive experiments on 9 datasets show that BoostFGL delivers substantial fairness gains, improving Overall-F1 by 8.43\%, while preserving competitive overall performance against strong FGL baselines.
LGJan 30
OptiMAG: Structure-Semantic Alignment via Unbalanced Optimal TransportYilong Zuo, Xunkai Li, Zhihan Zhang et al.
Multimodal Attributed Graphs (MAGs) have been widely adopted for modeling complex systems by integrating multi-modal information, such as text and images, on nodes. However, we identify a discrepancy between the implicit semantic structure induced by different modality embeddings and the explicit graph structure. For instance, neighbors in the explicit graph structure may be close in one modality but distant in another. Since existing methods typically perform message passing over the fixed explicit graph structure, they inadvertently aggregate dissimilar features, introducing modality-specific noise and impeding effective node representation learning. To address this, we propose OptiMAG, an Unbalanced Optimal Transport-based regularization framework. OptiMAG employs the Fused Gromov-Wasserstein distance to explicitly guide cross-modal structural consistency within local neighborhoods, effectively mitigating structural-semantic conflicts. Moreover, a KL divergence penalty enables adaptive handling of cross-modal inconsistencies. This framework can be seamlessly integrated into existing multimodal graph models, acting as an effective drop-in regularizer. Experiments demonstrate that OptiMAG consistently outperforms baselines across multiple tasks, ranging from graph-centric tasks (e.g., node classification, link prediction) to multimodal-centric generation tasks (e.g., graph2text, graph2image). The source code will be available upon acceptance.
51.1DBApr 24
MCI: A Maximal Clique Index for Efficient Arbitrary-Filtered Approximate Nearest Neighbor SearchXiaowei Ye, Rong-Hua Li, Guoren Wang et al.
Approximate Nearest Neighbor Search with arbitrary filtering predicates (AFANNS) is essential for modern data applications, yet existing methods often incur substantial storage and computational costs. In this work, we introduce the Maximal Clique Index (\mci), a novel graph-based index designed for robust and efficient AFANNS. The core idea of \mci is to approximate a dense Nearest Neighbor Graph (NNG) through a compact, clique-based representation. We propose two key techniques: (1) Maximal Clique Cover (\mcc), which exploits the geometric transitivity of high-dimensional spaces to encode dense neighborhoods as maximal cliques, achieving an index with high compression and connectivity; and (2) Local Neighborhood Graph Geometric Densification, a strategy that constructs an index approximating a large NNG from a sparse initial NNG, recovers global connectivity by progressively increasing distance thresholds to locally densify the structure. The index is built in a lock-free parallel manner for scalability and queried via a carefully-designed multi-seed strategy to handle fragmented predicate-induced subgraphs. Extensive experiments on 10 datasets show that \mci significantly outperforms state-of-the-art methods by up to one order of magnitude in QPS at high recall while using substantially smaller space, and remains competitive even on range/keyword filtering tasks, demonstrating robust general-purpose performance.
LGDec 17, 2024Code
Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and TasksXunkai Li, Zhengyu Wu, Jiayi Wu et al.
With the increasing prevalence of cross-domain Text-Attributed Graph (TAG) Data (e.g., citation networks, recommendation systems, social networks, and ai4science), the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) into a unified Model architecture (e.g., LLM as enhancer, LLM as collaborators, LLM as predictor) has emerged as a promising technological paradigm. The core of this new graph learning paradigm lies in the synergistic combination of GNNs' ability to capture complex structural relationships and LLMs' proficiency in understanding informative contexts from the rich textual descriptions of graphs. Therefore, we can leverage graph description texts with rich semantic context to fundamentally enhance Data quality, thereby improving the representational capacity of model-centric approaches in line with data-centric machine learning principles. By leveraging the strengths of these distinct neural network architectures, this integrated approach addresses a wide range of TAG-based Task (e.g., graph learning, graph reasoning, and graph question answering), particularly in complex industrial scenarios (e.g., supervised, few-shot, and zero-shot settings). In other words, we can treat text as a medium to enable cross-domain generalization of graph learning Model, allowing a single graph model to effectively handle the diversity of downstream graph-based Task across different data domains. This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies in the rapidly evolving landscape of LLM. We consistently maintain the related open-source materials at \url{https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers}.
AIJan 29
LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph LearningXunkai Li, Zhengyu Wu, Zekai Chen et al.
Recently, the rapid advancement of multimodal domains has driven a data-centric paradigm shift in graph ML, transitioning from text-attributed to multimodal-attributed graphs. This advancement significantly enhances data representation and expands the scope of graph downstream tasks, such as modality-oriented tasks, thereby improving the practical utility of graph ML. Despite its promise, limitations exist in the current neural paradigms: (1) Neglect Context in Modality Alignment: Most existing methods adopt topology-constrained or modality-specific operators as tokenizers. These aligners inevitably neglect graph context and inhibit modality interaction, resulting in suboptimal alignment. (2) Lack of Adaptation in Modality Fusion: Most existing methods are simple adaptations for 2-modality graphs and fail to adequately exploit aligned tokens equipped with topology priors during fusion, leading to poor generalizability and performance degradation. To address the above issues, we propose LION (c\underline{LI}ff\underline{O}rd \underline{N}eural paradigm) based on the Clifford algebra and decoupled graph neural paradigm (i.e., propagation-then-aggregation) to implement alignment-then-fusion in multimodal-attributed graphs. Specifically, we first construct a modality-aware geometric manifold grounded in Clifford algebra. This geometric-induced high-order graph propagation efficiently achieves modality interaction, facilitating modality alignment. Then, based on the geometric grade properties of aligned tokens, we propose adaptive holographic aggregation. This module integrates the energy and scale of geometric grades with learnable parameters to improve modality fusion. Extensive experiments on 9 datasets demonstrate that LION significantly outperforms SOTA baselines across 3 graph and 3 modality downstream tasks.
LGJan 23
DANCE: Dynamic, Available, Neighbor-gated Condensation for Federated Text-Attributed GraphsZekai Chen, Haodong Lu, Xunkai Li et al.
Federated graph learning (FGL) enables collaborative training on graph data across multiple clients. With the rise of large language models (LLMs), textual attributes in FGL graphs are gaining attention. Text-attributed graph federated learning (TAG-FGL) improves FGL by explicitly leveraging LLMs to process and integrate these textual features. However, current TAG-FGL methods face three main challenges: \textbf{(1) Overhead.} LLMs for processing long texts incur high token and computation costs. To make TAG-FGL practical, we introduce graph condensation (GC) to reduce computation load, but this choice also brings new issues. \textbf{(2) Suboptimal.} To reduce LLM overhead, we introduce GC into TAG-FGL by compressing multi-hop texts/neighborhoods into a condensed core with fixed LLM surrogates. However, this one-shot condensation is often not client-adaptive, leading to suboptimal performance. \textbf{(3) Interpretability.} LLM-based condensation further introduces a black-box bottleneck: summaries lack faithful attribution and clear grounding to specific source spans, making local inspection and auditing difficult. To address the above issues, we propose \textbf{DANCE}, a new TAG-FGL paradigm with GC. To improve \textbf{suboptimal} performance, DANCE performs round-wise, model-in-the-loop condensation refresh using the latest global model. To enhance \textbf{interpretability}, DANCE preserves provenance by storing locally inspectable evidence packs that trace predictions to selected neighbors and source text spans. Across 8 TAG datasets, DANCE improves accuracy by \textbf{2.33\%} at an \textbf{8\%} condensation ratio, with \textbf{33.42\%} fewer tokens than baselines.
IRFeb 26
TFPS: A Temporal Filtration-enhanced Positive Sample Set Construction Method for Implicit Collaborative FilteringJiayi Wu, Zhengyu Wu, Xunkai Li et al.
The negative sampling strategy can effectively train collaborative filtering (CF) recommendation models based on implicit feedback by constructing positive and negative samples. However, existing methods primarily optimize the negative sampling process while neglecting the exploration of positive samples. Some denoising recommendation methods can be applied to denoise positive samples within negative sampling strategies, but they ignore temporal information. Existing work integrates sequential information during model aggregation but neglects time interval information, hindering accurate capture of users' current preferences. To address this problem, from a data perspective, we propose a novel temporal filtration-enhanced approach to construct a high-quality positive sample set. First, we design a time decay model based on interaction time intervals, transforming the original graph into a weighted user-item bipartite graph. Then, based on predefined filtering operations, the weighted user-item bipartite graph is layered. Finally, we design a layer-enhancement strategy to construct a high-quality positive sample set for the layered subgraphs. We provide theoretical insights into why TFPS can improve Recall@k and NDCG@k, and extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed method. Additionally, TFPS can be integrated with various implicit CF recommenders or negative sampling methods to enhance its performance.
LGFeb 4
Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based ApproachSicheng Liu, Xunkai Li, Daohan Su et al.
Graph Foundation Models (GFMs) have achieved remarkable success in generalizing across diverse domains. However, they mainly focus on Text-Attributed Graphs (TAGs), leaving Multimodal-Attributed Graphs (MAGs) largely untapped. Developing Multimodal Graph Foundation Models (MGFMs) allows for leveraging the rich multimodal information in MAGs, and extends applicability to broader types of downstream tasks. While recent MGFMs integrate diverse modality information, our empirical investigation reveals two fundamental limitations of existing MGFMs: (1)they fail to explicitly model modality interaction, essential for capturing intricate cross-modal semantics beyond simple aggregation, and (2)they exhibit sub-optimal modality alignment, which is critical for bridging the significant semantic disparity between distinct modal spaces. To address these challenges, we propose PLANET (graPh topoLogy-aware modAlity iNteraction and alignmEnT), a novel framework employing a Divide-and-Conquer strategy to decouple modality interaction and alignment across distinct granularities. At the embedding granularity, (1)Embedding-wise Domain Gating (EDG) performs local semantic enrichment by adaptively infusing topology-aware cross-modal context, achieving modality interaction. At the node granularity, (2)Node-wise Discretization Retrieval (NDR) ensures global modality alignment by constructing a Discretized Semantic Representation Space (DSRS) to bridge modality gaps. Extensive experiments demonstrate that PLANET significantly outperforms state-of-the-art baselines across diverse graph-centric and multimodal generative tasks.
LGJan 30
Scalable Topology-Preserving Graph Coarsening with Graph CollapseXiang Wu, Rong-Hua Li, Xunkai Li et al.
Graph coarsening reduces the size of a graph while preserving certain properties. Most existing methods preserve either spectral or spatial characteristics. Recent research has shown that preserving topological features helps maintain the predictive performance of graph neural networks (GNNs) trained on the coarsened graph but suffers from exponential time complexity. To address these problems, we propose Scalable Topology-Preserving Graph Coarsening (STPGC) by introducing the concepts of graph strong collapse and graph edge collapse extended from algebraic topology. STPGC comprises three new algorithms, GStrongCollapse, GEdgeCollapse, and NeighborhoodConing based on these two concepts, which eliminate dominated nodes and edges while rigorously preserving topological features. We further prove that STPGC preserves the GNN receptive field and develop approximate algorithms to accelerate GNN training. Experiments on node classification with GNNs demonstrate the efficiency and effectiveness of STPGC.
69.6LGMay 15
GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing PerspectiveXu Wang, Xunkai Li, Yinlin Zhu et al.
Multimodal alignment is commonly learned from isolated image-text pairs via CLIP-style dual encoders, leaving the relational context among entities largely unused. Multimodal attributed graphs (MAGs), where nodes carry multimodal attributes and edges encode corpus structure, provide a natural setting for refining frozen vision-language embeddings. This refinement is challenging: visual, textual, and cross-modal relations often induce different neighborhood geometries, while unrestricted graph propagation can quickly over-smooth retrieval representations. Effectively leveraging graph context therefore requires simultaneously breaking modality-specific topological barriers, controlling the smoothing regime, and preserving informative smoothing before semantic boundaries collapse. We propose Graph-Optimized Multimodal Alignment (GOMA), a structure-driven post-alignment framework that views frozen multimodal embeddings as graph signals and addresses these requirements through a unified retrieval-oriented design. GOMA decouples three key design choices: where messages should flow, how multimodal evidence should propagate, and which smoothing depth should be retained. Concretely, it learns modality-aware propagation operators, performs finite-step coupled smoothing without diagonal cross-modal shortcuts, and adaptively reads out node-specific smoothing trajectories to preserve useful smoothing before collapse. All experiments follow a transductive MAG retrieval protocol where the graph serves only as unlabeled context and diagonal self-pair edges are removed. On seven MAG benchmarks, GOMA achieves state-of-the-art or tied state-of-the-art retrieval and remains substantially more stable than the strongest graph competitor, demonstrating that MAG structure can serve as an effective post-encoder for frozen multimodal embeddings.
LGJan 16
Theoretically and Practically Efficient Resistance Distance Computation on Large GraphsYichun Yang, Longlong Lin, Rong-Hua Li et al.
The computation of resistance distance is pivotal in a wide range of graph analysis applications, including graph clustering, link prediction, and graph neural networks. Despite its foundational importance, efficient algorithms for computing resistance distances on large graphs are still lacking. Existing state-of-the-art (SOTA) methods, including power iteration-based algorithms and random walk-based local approaches, often struggle with slow convergence rates, particularly when the condition number of the graph Laplacian matrix, denoted by $κ$, is large. To tackle this challenge, we propose two novel and efficient algorithms inspired by the classic Lanczos method: Lanczos Iteration and Lanczos Push, both designed to reduce dependence on $κ$. Among them, Lanczos Iteration is a near-linear time global algorithm, whereas Lanczos Push is a local algorithm with a time complexity independent of the size of the graph. More specifically, we prove that the time complexity of Lanczos Iteration is $\tilde{O}(\sqrtκ m)$ ($m$ is the number of edges of the graph and $\tilde{O}$ means the complexity omitting the $\log$ terms) which achieves a speedup of $\sqrtκ$ compared to previous power iteration-based global methods. For Lanczos Push, we demonstrate that its time complexity is $\tilde{O}(κ^{2.75})$ under certain mild and frequently established assumptions, which represents a significant improvement of $κ^{0.25}$ over the SOTA random walk-based local algorithms. We validate our algorithms through extensive experiments on eight real-world datasets of varying sizes and statistical properties, demonstrating that Lanczos Iteration and Lanczos Push significantly outperform SOTA methods in terms of both efficiency and accuracy.
LGOct 14, 2024Code
DiRW: Path-Aware Digraph Learning for HeterophilyDaohan Su, Xunkai Li, Zhenjun Li et al.
Recently, graph neural network (GNN) has emerged as a powerful representation learning tool for graph-structured data. However, most approaches are tailored for undirected graphs, neglecting the abundant information in the edges of directed graphs (digraphs). In fact, digraphs are widely applied in the real world and confirmed to address heterophily challenges. Despite recent advancements, existing spatial- and spectral-based DiGNNs have limitations due to their complex learning mechanisms and reliance on high-quality topology, resulting in low efficiency and unstable performance. To address these issues, we propose Directed Random Walk (DiRW), a plug-and-play strategy for most spatial-based DiGNNs and also an innovative model which offers a new digraph learning paradigm. Specifically, it utilizes a direction-aware path sampler optimized from the perspectives of walk probability, length, and number in a weight-free manner by considering node profiles and topologies. Building upon this, DiRW incorporates a node-wise learnable path aggregator for generalized node representations. Extensive experiments on 9 datasets demonstrate that DiRW: (1) enhances most spatial-based methods as a plug-and-play strategy; (2) achieves SOTA performance as a new digraph learning paradigm. The source code and data are available at https://github.com/dhsiuu/DiRW.
59.9LGMay 12
STAGE: Tackling Semantic Drift in Multimodal Federated Graph LearningZekai Chen, Xun Wu, Xunkai Li et al.
Federated graph learning (FGL) enables collaborative training on graph data across multiple clients. As graph data increasingly contain multimodal node attributes such as text and images, multimodal federated graph learning (MM-FGL) has become an important yet substantially harder setting. The key challenge is that clients from different modality domains may not share a common semantic space: even for the same concept, their local encoders can produce inconsistent representations before collaboration begins. This makes direct parameter coordination unreliable and further causes two downstream problems: forcing heterogeneous client representations into a naively shared semantic space may create false semantic agreement, and graph message passing may amplify residual inconsistency across neighborhoods. To address this issue, we propose \textbf{STAGE}, a protocol-first framework for MM-FGL. Instead of relying on direct parameter averaging, STAGE builds a shared semantic space that first translates heterogeneous multimodal features into comparable representations and then regulates how these representations propagate over local graph structures. In this way, STAGE not only improves cross-client semantic calibration, but also reduces the risk of inconsistency amplification during graph learning. Extensive experiments on 8 multimodal-attributed graphs across 5 graph-centric and modality-centric tasks show that STAGE consistently achieves state-of-the-art performance while reducing per-round communication payload.
51.5AIMay 12
CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and AggregationDaohan Su, Hao Liu, Xunkai Li et al.
Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation & Aggregation framework for decoupled multimodal graph learning. Concretely, CAMPA introduces a two-stage alignment mechanism: (1) cross-modal aligned propagation, which injects cross-modal similarity priors into message passing to preserve semantic consistency without additional parameter overhead; (2) trajectory aligned aggregation, which leverages trajectory-level self-attention and cross-attention to capture and align long-range dependencies across modalities and hops. Extensive experiments on diverse benchmark datasets and tasks demonstrate that CAMPA consistently outperforms strong coupled and decoupled baselines while preserving the efficiency advantages of the decoupled paradigm.
59.4LGMay 12
Towards Robust Federated Multimodal Graph Learning under Modality HeterogeneitySirui Zhang, Haonan Wang, Xunkai Li et al.
Recently, multimodal graph learning (MGL) has garnered significant attention for integrating diverse modality information and structured context to support various network applications. However, real-world graphs are often isolated due to data-sharing limitations across multiple parties, and their modalities are frequently incomplete. This highlights an urgent need to develop a robust federated approach. However, we find that existing methods remain insufficient. On the one hand, centralized MGL methods that handle missing modalities overlook the knowledge sharing and generalization in federated scenarios. On the other hand, while federated MGL methods have become increasingly mature, they primarily target non-graph data. Based on these technologies, we identify a two-stage pipeline wherein client-side completion reconstructs missing modalities, and server-side aggregation integrates the client-updated parameters of both the modality generator and the backbone models. Although this serves as a general solution, we identify two primary challenges in achieving greater robustness: (1) Topology-Isolated Local Completion: Client-side modality generation struggles to effectively leverage global semantics. (2) Reliability-Imbalanced Global Aggregation: Server-side multi-party collaboration is hindered by client updates with varying modality availability and recovery reliability. To address these challenges, we propose \textsc{FedMPO}, which utilizes topology-aware cross-modal generation to recover missing features using comprehensive graph context, missing-aware expert routing to locally filter out noisy recovered signals, and reliability-aware aggregation to appropriately down-weight unreliable updates. Extensive experiments on 3 tasks across 6 datasets demonstrate that FedMPO outperforms baselines, achieving performance gains of up to 4.10% and 5.65% in high-missing and non-IID settings.
85.9DBMar 22
StreamTGN: A GPU-Efficient Serving System for Streaming Temporal Graph Neural NetworksLingling Zhang, Pengpeng Qiao, Zhiwei Zhang et al.
Temporal Graph Neural Networks (TGNs) achieve state-of-the-art performance on dynamic graph tasks, yet existing systems focus exclusively on accelerating training -- at inference time, every new edge triggers $O(|V|)$ embedding updates even though only a small fraction of nodes are affected. We present \textbf{StreamTGN}, the first streaming TGN inference system exploiting the inherent locality of temporal graph updates: in an $L$-layer TGN, a new edge affects only nodes within $L$ hops of the endpoints, typically less than 0.2\% on million-node graphs. StreamTGN maintains persistent GPU-resident node memory and uses dirty-flag propagation to identify the affected set $\mathcal{A}$, reducing per-batch complexity from $O(|V|)$ to $O(|\mathcal{A}|)$ with zero accuracy loss. Drift-aware adaptive rebuild scheduling and batched streaming with relaxed ordering further maximize throughput. Experiments on eight temporal graphs (2K--2.6M nodes) show 4.5$\times$--739$\times$ speedup for TGN and up to 4,207$\times$ for TGAT, with identical accuracy. StreamTGN is orthogonal to training optimizations: combining SWIFT with StreamTGN yields 24$\times$ end-to-end speedup across three architectures (TGN, TGAT, DySAT).
LGOct 14, 2025Code
Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGsBowen Fan, Zhilin Guo, Xunkai Li et al.
Graph Neural Networks (GNNs) have become a pivotal framework for modeling graph-structured data, enabling a wide range of applications from social network analysis to molecular chemistry. By integrating large language models (LLMs), text-attributed graphs (TAGs) enhance node representations with rich textual semantics, significantly boosting the expressive power of graph-based learning. However, this sophisticated synergy introduces critical vulnerabilities, as Graph-LLMs are susceptible to adversarial attacks on both their structural topology and textual attributes. Although specialized attack methods have been designed for each of these aspects, no work has yet unified them into a comprehensive approach. In this work, we propose the Interpretable Multi-Dimensional Graph Attack (IMDGA), a novel human-centric adversarial attack framework designed to orchestrate multi-level perturbations across both graph structure and textual features. IMDGA utilizes three tightly integrated modules to craft attacks that balance interpretability and impact, enabling a deeper understanding of Graph-LLM vulnerabilities. Through rigorous theoretical analysis and comprehensive empirical evaluations on diverse datasets and architectures, IMDGA demonstrates superior interpretability, attack effectiveness, stealthiness, and robustness compared to existing methods. By exposing critical weaknesses in TAG representation learning, this work uncovers a previously underexplored semantic dimension of vulnerability in Graph-LLMs, offering valuable insights for improving their resilience. Our code and resources are publicly available at https://anonymous.4open.science/r/IMDGA-7289.
CVJul 26, 2025Code
LAVA: Language Driven Scalable and Versatile Traffic Video AnalyticsYanrui Yu, Tianfei Zhou, Jiaxin Sun et al.
In modern urban environments, camera networks generate massive amounts of operational footage -- reaching petabytes each day -- making scalable video analytics essential for efficient processing. Many existing approaches adopt an SQL-based paradigm for querying such large-scale video databases; however, this constrains queries to rigid patterns with predefined semantic categories, significantly limiting analytical flexibility. In this work, we explore a language-driven video analytics paradigm aimed at enabling flexible and efficient querying of high-volume video data driven by natural language. Particularly, we build \textsc{Lava}, a system that accepts natural language queries and retrieves traffic targets across multiple levels of granularity and arbitrary categories. \textsc{Lava} comprises three main components: 1) a multi-armed bandit-based efficient sampling method for video segment-level localization; 2) a video-specific open-world detection module for object-level retrieval; and 3) a long-term object trajectory extraction scheme for temporal object association, yielding complete trajectories for object-of-interests. To support comprehensive evaluation, we further develop a novel benchmark by providing diverse, semantically rich natural language predicates and fine-grained annotations for multiple videos. Experiments on this benchmark demonstrate that \textsc{Lava} improves $F_1$-scores for selection queries by $\mathbf{14\%}$, reduces MPAE for aggregation queries by $\mathbf{0.39}$, and achieves top-$k$ precision of $\mathbf{86\%}$, while processing videos $ \mathbf{9.6\times} $ faster than the most accurate baseline. Our code and dataset are available at https://github.com/yuyanrui/LAVA.
LGApr 19, 2025Code
Rethinking Client-oriented Federated Graph LearningZekai Chen, Xunkai Li, Yinlin Zhu et al.
As a new distributed graph learning paradigm, Federated Graph Learning (FGL) facilitates collaborative model training across local systems while preserving data privacy. We review existing FGL approaches and categorize their optimization mechanisms into: (1) Server-Client (S-C), where clients upload local model parameters for server-side aggregation and global updates; (2) Client-Client (C-C), which allows direct exchange of information between clients and customizing their local training process. We reveal that C-C shows superior potential due to its refined communication structure. However, existing C-C methods broadcast redundant node representations, incurring high communication costs and privacy risks at the node level. To this end, we propose FedC4, which combines graph Condensation with C-C Collaboration optimization. Specifically, FedC4 employs graph condensation technique to refine the knowledge of each client's graph into a few synthetic embeddings instead of transmitting node-level knowledge. Moreover, FedC4 introduces three novel modules that allow the source client to send distinct node representations tailored to the target client's graph properties. Experiments on eight public real-world datasets show that FedC4 outperforms state-of-the-art baselines in both task performance and communication cost. Our code is now available on https://github.com/Ereshkigal1/FedC4.
LGDec 2, 2021Code
Active Learning for Domain Adaptation: An Energy-Based ApproachBinhui Xie, Longhui Yuan, Shuang Li et al.
Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit \textit{free energy biases} when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of target data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at https://github.com/BIT-DA/EADA.
CVMay 11, 2021Code
Semantic Distribution-aware Contrastive Adaptation for Semantic SegmentationShuang Li, Binhui Xie, Bin Zang et al.
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain. Current state-of-the-art works suggest that performing category alignment can alleviate domain shift reasonably. However, they are mainly based on image-to-image adversarial training and little consideration is given to semantic variations of an object among images, failing to capture a comprehensive picture of different categories. This motivates us to explore a holistic representative, the semantic distribution from each category in source domain, to mitigate the problem above. In this paper, we present semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment under the guidance of semantic distributions. Specifically, we first design a pixel-wise contrastive loss by considering the correspondences between semantic distributions and pixel-wise representations from both domains. Essentially, clusters of pixel representations from the same category should cluster together and those from different categories should spread out. Next, an upper bound on this formulation is derived by involving the learning of an infinite number of (dis)similar pairs, making it efficient. Finally, we verify that SDCA can further improve segmentation accuracy when integrated with the self-supervised learning. We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.The code is publicly available at https://github.com/BIT-DA/SDCA
LGJan 22, 2024
FedGTA: Topology-aware Averaging for Federated Graph LearningXunkai Li, Zhengyu Wu, Wentao Zhang et al.
Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.
LGJan 22, 2024
Towards Effective and General Graph Unlearning via Mutual EvolutionXunkai Li, Yulin Zhao, Zhengyu Wu et al.
With the rapid advancement of AI applications, the growing needs for data privacy and model robustness have highlighted the importance of machine unlearning, especially in thriving graph-based scenarios. However, most existing graph unlearning strategies primarily rely on well-designed architectures or manual process, rendering them less user-friendly and posing challenges in terms of deployment efficiency. Furthermore, striking a balance between unlearning performance and framework generalization is also a pivotal concern. To address the above issues, we propose \underline{\textbf{M}}utual \underline{\textbf{E}}volution \underline{\textbf{G}}raph \underline{\textbf{U}}nlearning (MEGU), a new mutual evolution paradigm that simultaneously evolves the predictive and unlearning capacities of graph unlearning. By incorporating aforementioned two components, MEGU ensures complementary optimization in a unified training framework that aligns with the prediction and unlearning requirements. Extensive experiments on 9 graph benchmark datasets demonstrate the superior performance of MEGU in addressing unlearning requirements at the feature, node, and edge levels. Specifically, MEGU achieves average performance improvements of 2.7\%, 2.5\%, and 3.2\% across these three levels of unlearning tasks when compared to state-of-the-art baselines. Furthermore, MEGU exhibits satisfactory training efficiency, reducing time and space overhead by an average of 159.8x and 9.6x, respectively, in comparison to retraining GNN from scratch.
LGJan 22, 2024
AdaFGL: A New Paradigm for Federated Node Classification with Topology HeterogeneityXunkai Li, Zhengyu Wu, Wentao Zhang et al.
Recently, Federated Graph Learning (FGL) has attracted significant attention as a distributed framework based on graph neural networks, primarily due to its capability to break data silos. Existing FGL studies employ community split on the homophilous global graph by default to simulate federated semi-supervised node classification settings. Such a strategy assumes the consistency of topology between the multi-client subgraphs and the global graph, where connected nodes are highly likely to possess similar feature distributions and the same label. However, in real-world implementations, the varying perspectives of local data engineering result in various subgraph topologies, posing unique heterogeneity challenges in FGL. Unlike the well-known label Non-independent identical distribution (Non-iid) problems in federated learning, FGL heterogeneity essentially reveals the topological divergence among multiple clients, namely homophily or heterophily. To simulate and handle this unique challenge, we introduce the concept of structure Non-iid split and then present a new paradigm called \underline{Ada}ptive \underline{F}ederated \underline{G}raph \underline{L}earning (AdaFGL), a decoupled two-step personalized approach. To begin with, AdaFGL employs standard multi-client federated collaborative training to acquire the federated knowledge extractor by aggregating uploaded models in the final round at the server. Then, each client conducts personalized training based on the local subgraph and the federated knowledge extractor. Extensive experiments on the 12 graph benchmark datasets validate the superior performance of AdaFGL over state-of-the-art baselines. Specifically, in terms of test accuracy, our proposed AdaFGL outperforms baselines by significant margins of 3.24\% and 5.57\% on community split and structure Non-iid split, respectively.
LGDec 7, 2023
Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node ClassificationHenan Sun, Xunkai Li, Zhengyu Wu et al.
Recently, graph neural networks (GNNs) have shown prominent performance in semi-supervised node classification by leveraging knowledge from the graph database. However, most existing GNNs follow the homophily assumption, where connected nodes are more likely to exhibit similar feature distributions and the same labels, and such an assumption has proven to be vulnerable in a growing number of practical applications. As a supplement, heterophily reflects dissimilarity in connected nodes, which has gained significant attention in graph learning. To this end, data engineers aim to develop a powerful GNN model that can ensure performance under both homophily and heterophily. Despite numerous attempts, most existing GNNs struggle to achieve optimal node representations due to the constraints of undirected graphs. The neglect of directed edges results in sub-optimal graph representations, thereby hindering the capacity of GNNs. To address this issue, we introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning. Furthermore, we propose Adaptive Directed Pattern Aggregation (ADPA) as a new directed graph learning paradigm for AMUD. Empirical studies have demonstrated that AMUD guides efficient graph learning. Meanwhile, extensive experiments on 16 benchmark datasets substantiate the impressive performance of ADPA, outperforming baselines by significant margins of 3.96.
LGFeb 9, 2024
Rethinking Node-wise Propagation for Large-scale Graph LearningXunkai Li, Jingyuan Ma, Zhengyu Wu et al.
Scalable graph neural networks (GNNs) have emerged as a promising technique, which exhibits superior predictive performance and high running efficiency across numerous large-scale graph-based web applications. However, (i) Most scalable GNNs tend to treat all nodes in graphs with the same propagation rules, neglecting their topological uniqueness; (ii) Existing node-wise propagation optimization strategies are insufficient on web-scale graphs with intricate topology, where a full portrayal of nodes' local properties is required. Intuitively, different nodes in web-scale graphs possess distinct topological roles, and therefore propagating them indiscriminately or neglect local contexts may compromise the quality of node representations. This intricate topology in web-scale graphs cannot be matched by small-scale scenarios. To address the above issues, we propose \textbf{A}daptive \textbf{T}opology-aware \textbf{P}ropagation (ATP), which reduces potential high-bias propagation and extracts structural patterns of each node in a scalable manner to improve running efficiency and predictive performance. Remarkably, ATP is crafted to be a plug-and-play node-wise propagation optimization strategy, allowing for offline execution independent of the graph learning process in a new perspective. Therefore, this approach can be seamlessly integrated into most scalable GNNs while remain orthogonal to existing node-wise propagation optimization strategies. Extensive experiments on 12 datasets, including the most representative large-scale ogbn-papers100M, have demonstrated the effectiveness of ATP. Specifically, ATP has proven to be efficient in improving the performance of prevalent scalable GNNs for semi-supervised node classification while addressing redundant computational costs.
CVMar 2, 2024
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous DrivingKaituo Feng, Changsheng Li, Dongchun Ren et al.
End-to-end motion planning models equipped with deep neural networks have shown great potential for enabling full autonomous driving. However, the oversized neural networks render them impractical for deployment on resource-constrained systems, which unavoidably requires more computational time and resources during reference.To handle this, knowledge distillation offers a promising approach that compresses models by enabling a smaller student model to learn from a larger teacher model. Nevertheless, how to apply knowledge distillation to compress motion planners has not been explored so far. In this paper, we propose PlanKD, the first knowledge distillation framework tailored for compressing end-to-end motion planners. First, considering that driving scenes are inherently complex, often containing planning-irrelevant or even noisy information, transferring such information is not beneficial for the student planner. Thus, we design an information bottleneck based strategy to only distill planning-relevant information, rather than transfer all information indiscriminately. Second, different waypoints in an output planned trajectory may hold varying degrees of importance for motion planning, where a slight deviation in certain crucial waypoints might lead to a collision. Therefore, we devise a safety-aware waypoint-attentive distillation module that assigns adaptive weights to different waypoints based on the importance, to encourage the student to accurately mimic more crucial waypoints, thereby improving overall safety. Experiments demonstrate that our PlanKD can boost the performance of smaller planners by a large margin, and significantly reduce their reference time.
LGApr 1, 2025
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited EnvironmentsEnjun Du, Xunkai Li, Tian Jin et al.
The era of foundation models has revolutionized AI research, yet Graph Foundation Models (GFMs) remain constrained by the scarcity of large-scale graph corpora. Traditional graph data synthesis techniques primarily focus on simplistic structural operations, lacking the capacity to generate semantically rich nodes with meaningful textual attributes: a critical limitation for real-world applications. While large language models (LLMs) demonstrate exceptional text generation capabilities, their direct application to graph synthesis is impeded by context window limitations, hallucination phenomena, and structural consistency challenges. To address these issues, we introduce GraphMaster, the first multi-agent framework specifically designed for graph data synthesis in data-limited environments. GraphMaster orchestrates four specialized LLM agents (Manager, Perception, Enhancement, and Evaluation) that collaboratively optimize the synthesis process through iterative refinement, ensuring both semantic coherence and structural integrity. To rigorously evaluate our approach, we create new data-limited "Sub" variants of six standard graph benchmarks, specifically designed to test synthesis capabilities under realistic constraints. Additionally, we develop a novel interpretability assessment framework that combines human evaluation with a principled Grassmannian manifold-based analysis, providing both qualitative and quantitative measures of semantic coherence. Experimental results demonstrate that GraphMaster significantly outperforms traditional synthesis methods across multiple datasets, establishing a strong foundation for advancing GFMs in data-scarce environments.
LGJan 22, 2024
LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation LearningXunkai Li, Meihao Liao, Zhengyu Wu et al.
Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
CVDec 10, 2024
ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous DrivingRongqing Li, Changsheng Li, Yuhang Li et al.
Trajectory prediction of agents is crucial for the safety of autonomous vehicles, whereas previous approaches usually rely on sufficiently long-observed trajectory to predict the future trajectory of the agents. However, in real-world scenarios, it is not realistic to collect adequate observed locations for moving agents, leading to the collapse of most prediction models. For instance, when a moving car suddenly appears and is very close to an autonomous vehicle because of the obstruction, it is quite necessary for the autonomous vehicle to quickly and accurately predict the future trajectories of the car with limited observed trajectory locations. In light of this, we focus on investigating the task of instantaneous trajectory prediction, i.e., two observed locations are available during inference. To this end, we propose a general and plug-and-play instantaneous trajectory prediction approach, called ITPNet. Specifically, we propose a backward forecasting mechanism to reversely predict the latent feature representations of unobserved historical trajectories of the agent based on its two observed locations and then leverage them as complementary information for future trajectory prediction. Meanwhile, due to the inevitable existence of noise and redundancy in the predicted latent feature representations, we further devise a Noise Redundancy Reduction Former, aiming at to filter out noise and redundancy from unobserved trajectories and integrate the filtered features and observed features into a compact query for future trajectory predictions. In essence, ITPNet can be naturally compatible with existing trajectory prediction models, enabling them to gracefully handle the case of instantaneous trajectory prediction. Extensive experiments on the Argoverse and nuScenes datasets demonstrate ITPNet outperforms the baselines, and its efficacy with different trajectory prediction models.
LGJan 6, 2025
OpenGU: A Comprehensive Benchmark for Graph UnlearningBowen Fan, Yuming Ai, Xunkai Li et al.
Graph Machine Learning is essential for understanding and analyzing relational data. However, privacy-sensitive applications demand the ability to efficiently remove sensitive information from trained graph neural networks (GNNs), avoiding the unnecessary time and space overhead caused by retraining models from scratch. To address this issue, Graph Unlearning (GU) has emerged as a critical solution, with the potential to support dynamic graph updates in data management systems and enable scalable unlearning in distributed data systems while ensuring privacy compliance. Unlike machine unlearning in computer vision or other fields, GU faces unique difficulties due to the non-Euclidean nature of graph data and the recursive message-passing mechanism of GNNs. Additionally, the diversity of downstream tasks and the complexity of unlearning requests further amplify these challenges. Despite the proliferation of diverse GU strategies, the absence of a benchmark providing fair comparisons for GU, and the limited flexibility in combining downstream tasks and unlearning requests, have yielded inconsistencies in evaluations, hindering the development of this domain. To fill this gap, we present OpenGU, the first GU benchmark, where 16 SOTA GU algorithms and 37 multi-domain datasets are integrated, enabling various downstream tasks with 13 GNN backbones when responding to flexible unlearning requests. Based on this unified benchmark framework, we are able to provide a comprehensive and fair evaluation for GU. Through extensive experimentation, we have drawn $8$ crucial conclusions about existing GU methods, while also gaining valuable insights into their limitations, shedding light on potential avenues for future research.
LGNov 28, 2024
Towards Data-centric Machine Learning on Directed Graphs: a SurveyHenan Sun, Xunkai Li, Daohan Su et al.
In recent years, Graph Neural Networks (GNNs) have made significant advances in processing structured data. However, most of them primarily adopted a model-centric approach, which simplifies graphs by converting them into undirected formats and emphasizes model designs. This approach is inherently limited in real-world applications due to the unavoidable information loss in simple undirected graphs and the model optimization challenges that arise when exceeding the upper bounds of this sub-optimal data representational capacity. As a result, there has been a shift toward data-centric methods that prioritize improving graph quality and representation. Specifically, various types of graphs can be derived from naturally structured data, including heterogeneous graphs, hypergraphs, and directed graphs. Among these, directed graphs offer distinct advantages in topological systems by modeling causal relationships, and directed GNNs have been extensively studied in recent years. However, a comprehensive survey of this emerging topic is still lacking. Therefore, we aim to provide a comprehensive review of directed graph learning, with a particular focus on a data-centric perspective. Specifically, we first introduce a novel taxonomy for existing studies. Subsequently, we re-examine these methods from the data-centric perspective, with an emphasis on understanding and improving data representation. It demonstrates that a deep understanding of directed graphs and their quality plays a crucial role in model performance. Additionally, we explore the diverse applications of directed GNNs across 10+ domains, highlighting their broad applicability. Finally, we identify key opportunities and challenges within the field, offering insights that can guide future research and development in directed graph learning.