h-index44
34papers
663citations
Novelty53%
AI Score60

34 Papers

IRJul 12, 2022Code
Multi-Behavior Hypergraph-Enhanced Transformer for Sequential Recommendation

Yuhao Yang, Chao Huang, Lianghao Xia et al.

Learning dynamic user preference has become an increasingly important component for many online platforms (e.g., video-sharing sites, e-commerce systems) to make sequential recommendations. Previous works have made many efforts to model item-item transitions over user interaction sequences, based on various architectures, e.g., recurrent neural networks and self-attention mechanism. Recently emerged graph neural networks also serve as useful backbone models to capture item dependencies in sequential recommendation scenarios. Despite their effectiveness, existing methods have far focused on item sequence representation with singular type of interactions, and thus are limited to capture dynamic heterogeneous relational structures between users and items (e.g., page view, add-to-favorite, purchase). To tackle this challenge, we design a Multi-Behavior Hypergraph-enhanced Transformer framework (MBHT) to capture both short-term and long-term cross-type behavior dependencies. Specifically, a multi-scale Transformer is equipped with low-rank self-attention to jointly encode behavior-aware sequential patterns from fine-grained and coarse-grained levels. Additionally, we incorporate the global multi-behavior dependency into the hypergraph neural architecture to capture the hierarchical long-range item correlations in a customized manner. Experimental results demonstrate the superiority of our MBHT over various state-of-the-art recommendation solutions across different settings. Further ablation studies validate the effectiveness of our model design and benefits of the new MBHT framework. Our implementation code is released at: https://github.com/yuh-yang/MBHT-KDD22.

LGMay 8, 2022Code
Mutual Distillation Learning Network for Trajectory-User Linking

Wei Chen, Shuzhe Li, Chao Huang et al.

Trajectory-User Linking (TUL), which links trajectories to users who generate them, has been a challenging problem due to the sparsity in check-in mobility data. Existing methods ignore the utilization of historical data or rich contextual features in check-in data, resulting in poor performance for TUL task. In this paper, we propose a novel Mutual distillation learning network to solve the TUL problem for sparse check-in mobility data, named MainTUL. Specifically, MainTUL is composed of a Recurrent Neural Network (RNN) trajectory encoder that models sequential patterns of input trajectory and a temporal-aware Transformer trajectory encoder that captures long-term time dependencies for the corresponding augmented historical trajectories. Then, the knowledge learned on historical trajectories is transferred between the two trajectory encoders to guide the learning of both encoders to achieve mutual distillation of information. Experimental results on two real-world check-in mobility datasets demonstrate the superiority of MainTUL against state-of-the-art baselines. The source code of our model is available at https://github.com/Onedean/MainTUL.

LGFeb 11, 2023Code
Trajectory-User Linking via Hierarchical Spatio-Temporal Attention Networks

Wei Chen, Chao Huang, Yanwei Yu et al.

Trajectory-User Linking (TUL) is crucial for human mobility modeling by linking diferent trajectories to users with the exploration of complex mobility patterns. Existing works mainly rely on the recurrent neural framework to encode the temporal dependencies in trajectories, have fall short in capturing spatial-temporal global context for TUL prediction. To ill this gap, this work presents a new hierarchical spatio-temporal attention neural network, called AttnTUL, to jointly encode the local trajectory transitional patterns and global spatial dependencies for TUL. Speciically, our irst model component is built over the graph neural architecture to preserve the local and global context and enhance the representation paradigm of geographical regions and user trajectories. Additionally, a hierarchically structured attention network is designed to simultaneously encode the intra-trajectory and inter-trajectory dependencies, with the integration of the temporal attention mechanism and global elastic attentional encoder. Extensive experiments demonstrate the superiority of our AttnTUL method as compared to state-of-the-art baselines on various trajectory datasets. The source code of our model is available at https://github.com/Onedean/AttnTUL.

LGApr 20, 2022Code
Scalable Motif Counting for Large-scale Temporal Graphs

Zhongqiang Gao, Chuanqi Cheng, Yanwei Yu et al.

One fundamental problem in temporal graph analysis is to count the occurrences of small connected subgraph patterns (i.e., motifs), which benefits a broad range of real-world applications, such as anomaly detection, structure prediction, and network representation learning. However, existing works focused on exacting temporal motif are not scalable to large-scale temporal graph data, due to their heavy computational costs or inherent inadequacy of parallelism. In this work, we propose a scalable parallel framework for exactly counting temporal motifs in large-scale temporal graphs. We first categorize the temporal motifs based on their distinct properties, and then design customized algorithms that offer efficient strategies to exactly count the motif instances of each category. Moreover, our compact data structures, namely triple and quadruple counters, enable our algorithms to directly identify the temporal motif instances of each category, according to edge information and the relationship between edges, therefore significantly improving the counting efficiency. Based on the proposed counting algorithms, we design a hierarchical parallel framework that features both inter- and intra-node parallel strategies, and fully leverages the multi-threading capacity of modern CPU to concurrently count all temporal motifs. Extensive experiments on sixteen real-world temporal graph datasets demonstrate the superiority and capability of our proposed framework for temporal motif counting, achieving up to 538* speedup compared to the state-of-the-art methods. The source code of our method is available at: https://github.com/steven-ccq/FAST-temporal-motif.

SIAug 12, 2022
Multiplex Heterogeneous Graph Convolutional Network

Pengyang Yu, Chaofan Fu, Yanwei Yu et al.

Heterogeneous graph convolutional networks have gained great popularity in tackling various network analytical tasks on heterogeneous network data, ranging from link prediction to node classification. However, most existing works ignore the relation heterogeneity with multiplex network between multi-typed nodes and different importance of relations in meta-paths for node embedding, which can hardly capture the heterogeneous structure signals across different relations. To tackle this challenge, this work proposes a Multiplex Heterogeneous Graph Convolutional Network (MHGCN) for heterogeneous network embedding. Our MHGCN can automatically learn the useful heterogeneous meta-path interactions of different lengths in multiplex heterogeneous networks through multi-layer convolution aggregation. Additionally, we effectively integrate both multi-relation structural signals and attribute semantics into the learned node embeddings with both unsupervised and semi-supervised learning paradigms. Extensive experiments on five real-world datasets with various network analytical tasks demonstrate the significant superiority of MHGCN against state-of-the-art embedding baselines in terms of all evaluation metrics.

LGFeb 9Code
ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification

Sijia Peng, Yun Xiong, Xi Chen et al.

Time series data supports many domains (e.g., finance and climate science), but its rapid growth strains storage and computation. Dataset condensation can alleviate this by synthesizing a compact training set that preserves key information. Yet most condensation methods are image-centric and often fail on time series because they miss time-series-specific temporal structure, especially local discriminative motifs such as shapelets. In this work, we propose ShapeCond, a novel and efficient condensation framework for time series classification that leverages shapelet-based dataset knowledge via a shapelet-guided optimization strategy. Our shapelet-assisted synthesis cost is independent of sequence length: longer series yield larger speedups in synthesis (e.g., 29$\times$ faster over prior state-of-the-art method CondTSC for time-series condensation, and up to 10,000$\times$ over naively using shapelets on the Sleep dataset with 3,000 timesteps). By explicitly preserving critical local patterns, ShapeCond improves downstream accuracy and consistently outperforms all prior state-of-the-art time series dataset condensation methods across extensive experiments. Code is available at https://github.com/lunaaa95/ShapeCond.

LGAug 17, 2023
Cross-city Few-Shot Traffic Forecasting via Traffic Pattern Bank

Zhanyu Liu, Guanjie Zheng, Yanwei Yu

Traffic forecasting is a critical service in Intelligent Transportation Systems (ITS). Utilizing deep models to tackle this task relies heavily on data from traffic sensors or vehicle devices, while some cities might lack device support and thus have few available data. So, it is necessary to learn from data-rich cities and transfer the knowledge to data-scarce cities in order to improve the performance of traffic forecasting. To address this problem, we propose a cross-city few-shot traffic forecasting framework via Traffic Pattern Bank (TPB) due to that the traffic patterns are similar across cities. TPB utilizes a pre-trained traffic patch encoder to project raw traffic data from data-rich cities into high-dimensional space, from which a traffic pattern bank is generated through clustering. Then, the traffic data of the data-scarce city could query the traffic pattern bank and explicit relations between them are constructed. The metaknowledge is aggregated based on these relations and an adjacency matrix is constructed to guide a downstream spatial-temporal model in forecasting future traffic. The frequently used meta-training framework Reptile is adapted to find a better initial parameter for the learnable modules. Experiments on real-world traffic datasets show that TPB outperforms existing methods and demonstrates the effectiveness of our approach in cross-city few-shot traffic forecasting.

CVFeb 11Code
Enhancing Underwater Images via Adaptive Semantic-aware Codebook Learning

Bosen Lin, Feng Gao, Yanwei Yu et al.

Underwater Image Enhancement (UIE) is an ill-posed problem where natural clean references are not available, and the degradation levels vary significantly across semantic regions. Existing UIE methods treat images with a single global model and ignore the inconsistent degradation of different scene components. This oversight leads to significant color distortions and loss of fine details in heterogeneous underwater scenes, especially where degradation varies significantly across different image regions. Therefore, we propose SUCode (Semantic-aware Underwater Codebook Network), which achieves adaptive UIE from semantic-aware discrete codebook representation. Compared with one-shot codebook-based methods, SUCode exploits semantic-aware, pixel-level codebook representation tailored to heterogeneous underwater degradation. A three-stage training paradigm is employed to represent raw underwater image features to avoid pseudo ground-truth contamination. Gated Channel Attention Module (GCAM) and Frequency-Aware Feature Fusion (FAFF) jointly integrate channel and frequency cues for faithful color restoration and texture recovery. Extensive experiments on multiple benchmarks demonstrate that SUCode achieves state-of-the-art performance, outperforming recent UIE methods on both reference and no-reference metrics. The code will be made public available at https://github.com/oucailab/SUCode.

CVMar 2Code
Downstream Task Inspired Underwater Image Enhancement: A Perception-Aware Study from Dataset Construction to Network Design

Bosen Lin, Feng Gao, Yanwei Yu et al.

In real underwater environments, downstream image recognition tasks such as semantic segmentation and object detection often face challenges posed by problems like blurring and color inconsistencies. Underwater image enhancement (UIE) has emerged as a promising preprocessing approach, aiming to improve the recognizability of targets in underwater images. However, most existing UIE methods mainly focus on enhancing images for human visual perception, frequently failing to reconstruct high-frequency details that are critical for task-specific recognition. To address this issue, we propose a Downstream Task-Inspired Underwater Image Enhancement (DTI-UIE) framework, which leverages human visual perception model to enhance images effectively for underwater vision tasks. Specifically, we design an efficient two-branch network with task-aware attention module for feature mixing. The network benefits from a multi-stage training framework and a task-driven perceptual loss. Additionally, inspired by human perception, we automatically construct a Task-Inspired UIE Dataset (TI-UIED) using various task-specific networks. Experimental results demonstrate that DTI-UIE significantly improves task performance by generating preprocessed images that are beneficial for downstream tasks such as semantic segmentation, object detection, and instance segmentation. The codes are publicly available at https://github.com/oucailab/DTIUIE.

LGMar 21, 2024Code
Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond

Wei Chen, Yuxuan Liang, Yuanshao Zhu et al.

Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptability to real-world complexities. In this paper, we present a comprehensive review of the development and recent advances in deep learning for trajectory computing (DL4Traj). We first define trajectory data and provide a brief overview of widely-used deep learning models. Systematically, we explore deep learning applications in trajectory management (pre-processing, storage, analysis, and visualization) and mining (trajectory-related forecasting, trajectory-related recommendation, trajectory classification, travel time estimation, anomaly detection, and mobility generation). Notably, we encapsulate recent advancements in Large Language Models (LLMs) that hold the potential to augment trajectory computing. Additionally, we summarize application scenarios, public datasets, and toolkits. Finally, we outline current challenges in DL4Traj research and propose future directions. Relevant papers and open-source resources have been collated and are continuously updated at: \href{https://github.com/yoshall/Awesome-Trajectory-Computing}{DL4Traj Repo}.

LGFeb 1, 2024Code
Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic Forecasting

Zhanyu Liu, Guanjie Zheng, Yanwei Yu

Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworthy observation: traffic patterns exhibit similarities across diverse cities. Building on this key insight, we propose a solution for the cross-city few-shot traffic forecasting problem called Multi-scale Traffic Pattern Bank (MTPB). Primarily, MTPB initiates its learning process by leveraging data-rich source cities, effectively acquiring comprehensive traffic knowledge through a spatial-temporal-aware pre-training process. Subsequently, the framework employs advanced clustering techniques to systematically generate a multi-scale traffic pattern bank derived from the learned knowledge. Next, the traffic data of the data-scarce target city could query the traffic pattern bank, facilitating the aggregation of meta-knowledge. This meta-knowledge, in turn, assumes a pivotal role as a robust guide in subsequent processes involving graph reconstruction and forecasting. Empirical assessments conducted on real-world traffic datasets affirm the superior performance of MTPB, surpassing existing methods across various categories and exhibiting numerous attributes conducive to the advancement of cross-city few-shot forecasting methodologies. The code is available in https://github.com/zhyliu00/MTPB.

CVAug 25, 2023
Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

Yibo Wang, Yunhu Ye, Yuanpeng Mao et al.

Text segmentation tasks have a very wide range of application values, such as image editing, style transfer, watermark removal, etc.However, existing public datasets are of poor quality of pixel-level labels that have been shown to be notoriously costly to acquire, both in terms of money and time. At the same time, when pretraining is performed on synthetic datasets, the data distribution of the synthetic datasets is far from the data distribution in the real scene. These all pose a huge challenge to the current pixel-level text segmentation algorithms.To alleviate the above problems, we propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background. In our method, we propose two novel designs which include Region Query Module and Representation Consistency Constraints adapting to the unique properties of text as complements to Auto Encoder, which improves the network's sensitivity to texts.For this unique design, we treat the polygon-level masks predicted by the text localization model as extra input information, and neither utilize any pixel-level mask annotations for training stage nor pretrain on synthetic datasets.Extensive experiments show the effectiveness of the method proposed. On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.

12.2LGMay 11
Unlocking air traffic flow prediction through microscopic aircraft-state modeling

Bin Wang, Anqi Liu, Jiangtao Zhao et al.

Short-term air traffic flow prediction in terminal airspace is essential for proactive air traffic management. Existing approaches predominantly model traffic flow as aggregated time series, despite traffic dynamics being governed by aircraft states and interactions in continuous airspace. Such aggregation obscures fine-grained information including aircraft kinematics, boundary interactions, and control intent. Here we present AeroSense, a state-to-flow modeling framework that predicts future traffic flow directly from instantaneous airspace situations represented as dynamic sets of aircraft states derived from ADS-B trajectories. By establishing an end-to-end mapping from microscopic aircraft states to future regional traffic flow, AeroSense preserves aircraft-level dynamics while naturally accommodating varying traffic density without relying on historical look-back windows. Experiments on a large-scale real-world dataset show that AeroSense consistently improves predictive accuracy over aggregation-based forecasting approaches, particularly during high-density traffic periods. These findings suggest that instantaneous airspace situations provide an effective alternative to conventional time-series-based traffic forecasting paradigms.

LGMay 29, 2025Code
Non-collective Calibrating Strategy for Time Series Forecasting

Bin Wang, Yongqi Han, Minbo Ma et al.

Deep learning-based approaches have demonstrated significant advancements in time series forecasting. Despite these ongoing developments, the complex dynamics of time series make it challenging to establish the rule of thumb for designing the golden model architecture. In this study, we argue that refining existing advanced models through a universal calibrating strategy can deliver substantial benefits with minimal resource costs, as opposed to elaborating and training a new model from scratch. We first identify a multi-target learning conflict in the calibrating process, which arises when optimizing variables across time steps, leading to the underutilization of the model's learning capabilities. To address this issue, we propose an innovative calibrating strategy called Socket+Plug (SoP). This approach retains an exclusive optimizer and early-stopping monitor for each predicted target within each Plug while keeping the fully trained Socket backbone frozen. The model-agnostic nature of SoP allows it to directly calibrate the performance of any trained deep forecasting models, regardless of their specific architectures. Extensive experiments on various time series benchmarks and a spatio-temporal meteorological ERA5 dataset demonstrate the effectiveness of SoP, achieving up to a 22% improvement even when employing a simple MLP as the Plug (highlighted in Figure 1). Code is available at https://github.com/hanyuki23/SoP.

19.8LGApr 16
From Time Series to State: Situation-Aware Modeling for Air Traffic Flow Prediction

Anqi Liu, Jiangtao Zhao, Guiyuan Jiang et al.

Accurate air traffic prediction in the terminal airspace (TA) is pivotal for proactive air traffic management (ATM). However, existing data-driven approaches predominantly rely on time series-based forecasting paradigms, which inherently overlook critical aircraft state information, such as real-time kinematics and proximity to airspace boundaries. To address this limitation, we propose \textit{AeroSense}, a direct state-to-flow modeling framework for air traffic prediction. Unlike classical time series-based methods that first aggregate aircraft trajectories into macroscopic flow sequences before modeling, AeroSense explicitly represents the real-time airspace situation as \textit{a dynamic set of aircraft states}, enabling the direct processing of a variable number of aircraft instead of time series as inputs. Specifically, we introduce a situation-aware state representation that enables AeroSense to sense the instantaneous terminal airspace situation directly from microscopic aircraft states. Furthermore, we design a model architecture that incorporates masked self-attention to capture inter-aircraft interactions, together with two decoupled prediction heads to model heterogeneous flow dynamics across two key functional areas of the TA. Extensive experiments on a large-scale real-world airport dataset demonstrate that AeroSense consistently achieves state-of-the-art performance, validating that direct modeling of microscopic aircraft states yields substantially higher predictive fidelity than time series-based baselines. Moreover, the proposed framework exhibits superior robustness during peak traffic periods, achieves Pareto-optimal performance under dayparting multi-object evaluation, and provides meaningful interpretability through attention-based visualizations.

LGFeb 17, 2025
Spatiotemporal-aware Trend-Seasonality Decomposition Network for Traffic Flow Forecasting

Lingxiao Cao, Bin Wang, Guiyuan Jiang et al.

Traffic prediction is critical for optimizing travel scheduling and enhancing public safety, yet the complex spatial and temporal dynamics within traffic data present significant challenges for accurate forecasting. In this paper, we introduce a novel model, the Spatiotemporal-aware Trend-Seasonality Decomposition Network (STDN). This model begins by constructing a dynamic graph structure to represent traffic flow and incorporates novel spatio-temporal embeddings to jointly capture global traffic dynamics. The representations learned are further refined by a specially designed trend-seasonality decomposition module, which disentangles the trend-cyclical component and seasonal component for each traffic node at different times within the graph. These components are subsequently processed through an encoder-decoder network to generate the final predictions. Extensive experiments conducted on real-world traffic datasets demonstrate that STDN achieves superior performance with remarkable computation cost. Furthermore, we have released a new traffic dataset named JiNan, which features unique inner-city dynamics, thereby enriching the scenario comprehensiveness in traffic prediction evaluation.

LGMar 12, 2024
Dataset Condensation for Time Series Classification via Dual Domain Matching

Zhanyu Liu, Ke Hao, Guanjie Zheng et al.

Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.

IRMar 18, 2024
Dual-Channel Multiplex Graph Neural Networks for Recommendation

Xiang Li, Chaofan Fu, Zhongying Zhao et al.

Effective recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interactive relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant challenges: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations within behavior patterns on the target relation in recommender system scenarios. In this work, we introduce a novel recommendation framework, Dual-Channel Multiplex Graph Neural Network (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interactive relations, and includes a relation chain representation learner and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our DCMGNN surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06% and 12.15% on average across all datasets in terms of Recall@10 and NDCG@10, respectively.

LGMar 19, 2025
Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Hao Zhang, Wei Chen, Xingyu Zhao et al.

Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale $\underline{TUL}$ problem, named ScaleTUL. Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder, consisting of a long-term encoder and a short-term encoder, is designed to learn unified trajectory representations that fuse different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model. Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.

LGMar 18, 2025
Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning

Ruobing Jiang, Yacong Li, Haobing Liu et al.

Heterogeneous graphs (HGs) are composed of multiple types of nodes and edges, making it more effective in capturing the complex relational structures inherent in the real world. However, in real-world scenarios, labeled data is often difficult to obtain, which limits the applicability of semi-supervised approaches. Self-supervised learning aims to enable models to automatically learn useful features from data, effectively addressing the challenge of limited labeling data. In this paper, we propose a novel contrastive learning framework for heterogeneous graphs (ASHGCL), which incorporates three distinct views, each focusing on node attributes, high-order and low-order structural information, respectively, to effectively capture attribute information, high-order structures, and low-order structures for node representation learning. Furthermore, we introduce an attribute-enhanced positive sample selection strategy that combines both structural information and attribute information, effectively addressing the issue of sampling bias. Extensive experiments on four real-world datasets show that ASHGCL outperforms state-of-the-art unsupervised baselines and even surpasses some supervised benchmarks.

LGNov 19, 2024
UMGAD: Unsupervised Multiplex Graph Anomaly Detection

Xiang Li, Jianpeng Qi, Zhongying Zhao et al.

Graph anomaly detection (GAD) is a critical task in graph machine learning, with the primary objective of identifying anomalous nodes that deviate significantly from the majority. This task is widely applied in various real-world scenarios, including fraud detection and social network analysis. However, existing GAD methods still face two major challenges: (1) They are often limited to detecting anomalies in single-type interaction graphs and struggle with multiple interaction types in multiplex heterogeneous graphs. (2) In unsupervised scenarios, selecting appropriate anomaly score thresholds remains a significant challenge for accurate anomaly detection. To address the above challenges, we propose a novel Unsupervised Multiplex Graph Anomaly Detection method, named UMGAD. We first learn multi-relational correlations among nodes in multiplex heterogeneous graphs and capture anomaly information during node attribute and structure reconstruction through graph-masked autoencoder (GMAE). Then, to further extract abnormal information, we generate attribute-level and subgraph-level augmented-view graphs, respectively, and perform attribute and structure reconstruction through GMAE. Finally, we learn to optimize node attributes and structural features through contrastive learning between original-view and augmented-view graphs to improve the model's ability to capture anomalies. Meanwhile, we propose a new anomaly score threshold selection strategy, which allows the model to be independent of ground truth information in real unsupervised scenarios. Extensive experiments on six datasets show that our UMGAD significantly outperforms state-of-the-art methods, achieving average improvements of 12.25% in AUC and 11.29% in Macro-F1 across all datasets.

AIDec 24, 2024
AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

Lixian Jing, Jianpeng Qi, Junyu Dong et al.

As deep neural networks (DNNs) are increasingly deployed on edge devices, optimizing models for constrained computational resources is critical. Existing auto-pruning methods face challenges due to the diversity of DNN models, various operators (e.g., filters), and the difficulty in balancing pruning granularity with model accuracy. To address these limitations, we introduce AutoSculpt, a pattern-based automated pruning framework designed to enhance efficiency and accuracy by leveraging graph learning and deep reinforcement learning (DRL). AutoSculpt automatically identifies and prunes regular patterns within DNN architectures that can be recognized by existing inference engines, enabling runtime acceleration. Three key steps in AutoSculpt include: (1) Constructing DNNs as graphs to encode their topology and parameter dependencies, (2) embedding computationally efficient pruning patterns, and (3) utilizing DRL to iteratively refine auto-pruning strategies until the optimal balance between compression and accuracy is achieved. Experimental results demonstrate the effectiveness of AutoSculpt across various architectures, including ResNet, MobileNet, VGG, and Vision Transformer, achieving pruning rates of up to 90% and nearly 18% improvement in FLOPs reduction, outperforming all baselines. The codes can be available at https://anonymous.4open.science/r/AutoSculpt-DDA0

SIMar 28, 2025
Correlation-Attention Masked Temporal Transformer for User Identity Linkage Using Heterogeneous Mobility Data

Ziang Yan, Xingyu Zhao, Hanqing Ma et al.

With the rise of social media and Location-Based Social Networks (LBSN), check-in data across platforms has become crucial for User Identity Linkage (UIL). These data not only reveal users' spatio-temporal information but also provide insights into their behavior patterns and interests. However, cross-platform identity linkage faces challenges like poor data quality, high sparsity, and noise interference, which hinder existing methods from extracting cross-platform user information. To address these issues, we propose a Correlation-Attention Masked Transformer for User Identity Linkage Network (MT-Link), a transformer-based framework to enhance model performance by learning spatio-temporal co-occurrence patterns of cross-platform users. Our model effectively captures spatio-temporal co-occurrence in cross-platform user check-in sequences. It employs a correlation attention mechanism to detect the spatio-temporal co-occurrence between user check-in sequences. Guided by attention weight maps, the model focuses on co-occurrence points while filtering out noise, ultimately improving classification performance. Experimental results show that our model significantly outperforms state-of-the-art baselines by 12.92%~17.76% and 5.80%~8.38% improvements in terms of Macro-F1 and Area Under Curve (AUC).

NIDec 24, 2024
TPAoI: Ensuring Fresh Service Status at the Network Edge in Compute-First Networking

Haosheng He, Jianpeng Qi, Chao Liu et al.

In compute-first networking, maintaining fresh and accurate status information at the network edge is crucial for effective access to remote services. This process typically involves three phases: Status updating, user accessing, and user requesting. However, current studies on status effectiveness, such as Age of Information at Query (QAoI), do not comprehensively cover all these phases. Therefore, this paper introduces a novel metric, TPAoI, aimed at optimizing update decisions by measuring the freshness of service status. The stochastic nature of edge environments, characterized by unpredictable communication delays in updating, requesting, and user access times, poses a significant challenge when modeling. To address this, we model the problem as a Markov Decision Process (MDP) and employ a Dueling Double Deep Q-Network (D3QN) algorithm for optimization. Extensive experiments demonstrate that the proposed TPAoI metric effectively minimizes AoI, ensuring timely and reliable service updates in dynamic edge environments. Results indicate that TPAoI reduces AoI by an average of 47\% compared to QAoI metrics and decreases update frequency by an average of 48\% relative to conventional AoI metrics, showing significant improvement.

LGAug 19, 2025
Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification

Ruobing Jiang, Mengzhe Liu, Haobing Liu et al.

Hierarchical Multi-Label Classification (HMC) faces critical challenges in maintaining structural consistency and balancing loss weighting in Multi-Task Learning (MTL). In order to address these issues, we propose a classifier called HCAL based on MTL integrated with prototype contrastive learning and adaptive task-weighting mechanisms. The most significant advantage of our classifier is semantic consistency including both prototype with explicitly modeling label and feature aggregation from child classes to parent classes. The other important advantage is an adaptive loss-weighting mechanism that dynamically allocates optimization resources by monitoring task-specific convergence rates. It effectively resolves the "one-strong-many-weak" optimization bias inherent in traditional MTL approaches. To further enhance robustness, a prototype perturbation mechanism is formulated by injecting controlled noise into prototype to expand decision boundaries. Additionally, we formalize a quantitative metric called Hierarchical Violation Rate (HVR) as to evaluate hierarchical consistency and generalization. Extensive experiments across three datasets demonstrate both the higher classification accuracy and reduced hierarchical violation rate of the proposed classifier over baseline models.

CVAug 15, 2025
Exploring the Tradeoff Between Diversity and Discrimination for Continuous Category Discovery

Ruobing Jiang, Yang Liu, Haobing Liu et al.

Continuous category discovery (CCD) aims to automatically discover novel categories in continuously arriving unlabeled data. This is a challenging problem considering that there is no number of categories and labels in the newly arrived data, while also needing to mitigate catastrophic forgetting. Most CCD methods cannot handle the contradiction between novel class discovery and classification well. They are also prone to accumulate errors in the process of gradually discovering novel classes. Moreover, most of them use knowledge distillation and data replay to prevent forgetting, occupying more storage space. To address these limitations, we propose Independence-based Diversity and Orthogonality-based Discrimination (IDOD). IDOD mainly includes independent enrichment of diversity module, joint discovery of novelty module, and continuous increment by orthogonality module. In independent enrichment, the backbone is trained separately using contrastive loss to avoid it focusing only on features for classification. Joint discovery transforms multi-stage novel class discovery into single-stage, reducing error accumulation impact. Continuous increment by orthogonality module generates mutually orthogonal prototypes for classification and prevents forgetting with lower space overhead via representative representation replay. Experimental results show that on challenging fine-grained datasets, our method outperforms the state-of-the-art methods.

LGJun 18, 2025
TrajDiff: Diffusion Bridge Network with Semantic Alignment for Trajectory Similarity Computation

Xiao Zhang, Xingyu Zhao, Hong Xia et al.

With the proliferation of location-tracking technologies, massive volumes of trajectory data are continuously being collected. As a fundamental task in trajectory data mining, trajectory similarity computation plays a critical role in a wide range of real-world applications. However, existing learning-based methods face three challenges: First, they ignore the semantic gap between GPS and grid features in trajectories, making it difficult to obtain meaningful trajectory embeddings. Second, the noise inherent in the trajectories, as well as the noise introduced during grid discretization, obscures the true motion patterns of the trajectories. Third, existing methods focus solely on point-wise and pair-wise losses, without utilizing the global ranking information obtained by sorting all trajectories according to their similarity to a given trajectory. To address the aforementioned challenges, we propose a novel trajectory similarity computation framework, named TrajDiff. Specifically, the semantic alignment module relies on cross-attention and an attention score mask mechanism with adaptive fusion, effectively eliminating semantic discrepancies between data at two scales and generating a unified representation. Additionally, the DDBM-based Noise-robust Pre-Training introduces the transfer patterns between any two trajectories into the model training process, enhancing the model's noise robustness. Finally, the overall ranking-aware regularization shifts the model's focus from a local to a global perspective, enabling it to capture the holistic ordering information among trajectories. Extensive experiments on three publicly available datasets show that TrajDiff consistently outperforms state-of-the-art baselines. In particular, it achieves an average HR@1 gain of 33.38% across all three evaluation metrics and datasets.

AIJun 13, 2025
Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding

Zifan Liu, Yuan Cao, Yanwei Yu et al.

Channel pruning is a powerful technique to reduce the computational overhead of deep neural networks, enabling efficient deployment on resource-constrained devices. However, existing pruning methods often rely on local heuristics or weight-based criteria that fail to capture global structural dependencies within the network, leading to suboptimal pruning decisions and degraded model performance. To address these limitations, we propose a novel structure-aware automatic channel pruning (SACP) framework that utilizes graph convolutional networks (GCNs) to model the network topology and learn the global importance of each channel. By encoding structural relationships within the network, our approach implements topology-aware pruning and this pruning is fully automated, reducing the need for human intervention. We restrict the pruning rate combinations to a specific space, where the number of combinations can be dynamically adjusted, and use a search-based approach to determine the optimal pruning rate combinations. Extensive experiments on benchmark datasets (CIFAR-10, ImageNet) with various models (ResNet, VGG16) demonstrate that SACP outperforms state-of-the-art pruning methods on compression efficiency and competitive on accuracy retention.

DBApr 22, 2025
Efficient Discovery of Motif Transition Process for Large-Scale Temporal Graphs

Zhiyuan Zheng, Jianpeng Qi, Jiantao Li et al.

Understanding the dynamic transition of motifs in temporal graphs is essential for revealing how graph structures evolve over time, identifying critical patterns, and predicting future behaviors, yet existing methods often focus on predefined motifs, limiting their ability to comprehensively capture transitions and interrelationships. We propose a parallel motif transition process discovery algorithm, PTMT, a novel parallel method for discovering motif transition processes in large-scale temporal graphs. PTMT integrates a tree-based framework with the temporal zone partitioning (TZP) strategy, which partitions temporal graphs by time and structure while preserving lossless motif transitions and enabling massive parallelism. PTMT comprises three phases: growth zone parallel expansion, overlap-aware result aggregation, and deterministic encoding of motif transitions, ensuring accurate tracking of dynamic transitions and interactions. Results on 10 real-world datasets demonstrate that PTMT achieves speedups ranging from 12.0$\times$ to 50.3$\times$ compared to the SOTA method.

LGApr 22, 2025
ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion

Xiang Li, Jianpeng Qi, Haobing Liu et al.

Graph Neural Networks (GNNs) have demonstrated impressive performance across diverse graph-based tasks by leveraging message passing to capture complex node relationships. However, when applied to large-scale real-world graphs, GNNs face two major challenges: First, it becomes increasingly difficult to ensure both scalability and efficiency, as the repeated aggregation of large neighborhoods leads to significant computational overhead; Second, the over-smoothing problem arises, where excessive or deep propagation makes node representations indistinguishable, severely hindering model expressiveness. To tackle these issues, we propose ScaleGNN, a novel framework that adaptively fuses multi-hop node features for both scalable and effective graph learning. First, we construct per-hop pure neighbor matrices that capture only the exclusive structural information at each hop, avoiding the redundancy of conventional aggregation. Then, an enhanced feature fusion strategy significantly balances low-order and high-order information, preserving both local detail and global correlations without incurring excessive complexity. To further reduce redundancy and over-smoothing, we introduce a Local Contribution Score (LCS)-based masking mechanism to filter out less relevant high-order neighbors, ensuring that only the most meaningful information is aggregated. In addition, learnable sparse constraints selectively integrate multi-hop valuable features, emphasizing the most informative high-order neighbors. Extensive experiments on real-world datasets demonstrate that ScaleGNN consistently outperforms state-of-the-art GNNs in both predictive accuracy and computational efficiency, highlighting its practical value for large-scale graph learning.

IRDec 18, 2024
Lightweight yet Fine-grained: A Graph Capsule Convolutional Network with Subspace Alignment for Shared-account Sequential Recommendation

Jinyu Zhang, Zhongying Zhao, Chao Li et al.

Shared-account Sequential Recommendation (SSR) aims to provide personalized recommendations for accounts shared by multiple users with varying sequential preferences. Previous studies on SSR struggle to capture the fine-grained associations between interactions and different latent users within the shared account's hybrid sequences. Moreover, most existing SSR methods (e.g., RNN-based or GCN-based methods) have quadratic computational complexities, hindering the deployment of SSRs on resource-constrained devices. To this end, we propose a Lightweight Graph Capsule Convolutional Network with subspace alignment for shared-account sequential recommendation, named LightGC$^2$N. Specifically, we devise a lightweight graph capsule convolutional network. It facilitates the fine-grained matching between interactions and latent users by attentively propagating messages on the capsule graphs. Besides, we present an efficient subspace alignment method. This method refines the sequence representations and then aligns them with the finely clustered preferences of latent users. The experimental results on four real-world datasets indicate that LightGC$^2$N outperforms nine state-of-the-art methods in accuracy and efficiency.

LGNov 2, 2024
Multi-Channel Hypergraph Contrastive Learning for Matrix Completion

Xiang Li, Changsheng Shui, Zhongying Zhao et al.

Rating is a typical user explicit feedback that visually reflects how much a user likes a related item. The (rating) matrix completion is essentially a rating prediction process, which is also a significant problem in recommender systems. Recently, graph neural networks (GNNs) have been widely used in matrix completion, which captures users' preferences over items by formulating a rating matrix as a bipartite graph. However, existing methods are susceptible due to data sparsity and long-tail distribution in real-world scenarios. Moreover, the messaging mechanism of GNNs makes it difficult to capture high-order correlations and constraints between nodes, which are essentially useful in recommendation tasks. To tackle these challenges, we propose a Multi-Channel Hypergraph Contrastive Learning framework for matrix completion, named MHCL. Specifically, MHCL adaptively learns hypergraph structures to capture high-order correlations between nodes and jointly captures local and global collaborative relationships through attention-based cross-view aggregation. Additionally, to consider the magnitude and order information of ratings, we treat different rating subgraphs as different channels, encourage alignment between adjacent ratings, and further achieve the mutual enhancement between different ratings through multi-channel cross-rating contrastive learning. Extensive experiments on five public datasets demonstrate that the proposed method significantly outperforms the current state-of-the-art approaches.

CVJul 4, 2021
Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel

Zhihao Wang, Yanwei Yu, Yibo Wang et al.

Offline Chinese handwriting text recognition is a long-standing research topic in the field of pattern recognition. In previous studies, text detection and recognition are separated, which leads to the fact that text recognition is highly dependent on the detection results. In this paper, we propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel that integrates global text feature information to optimize the recognition from multiple scales, which reduces the dependence of detection and improves the robustness of the system. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset. Without any language model, the correct rates are 99.12% and 94.27% for line-level recognition, and 99.03% and 94.20% for page-level recognition, respectively.

LGFeb 25, 2019
Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

Xianfeng Tang, Boqing Gong, Yanwei Yu et al.

Real-time traffic volume inference is key to an intelligent city. It is a challenging task because accurate traffic volumes on the roads can only be measured at certain locations where sensors are installed. Moreover, the traffic evolves over time due to the influences of weather, events, holidays, etc. Existing solutions to the traffic volume inference problem often rely on dense GPS trajectories, which inevitably fail to account for the vehicles which carry no GPS devices or have them turned off. Consequently, the results are biased to taxicabs because they are almost always online for GPS tracking. In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems. Our approach employs a high-fidelity traffic simulator and deep reinforcement learning to recover full vehicle movements from the incomplete trajectories. In order to jointly model the recovered trajectories and dense GPS trajectories, we construct spatiotemporal graphs and use multi-view graph embedding to encode the multi-hop correlations between road segments into real-valued vectors. Finally, we infer the citywide traffic volumes by propagating the traffic values of monitored road segments to the unmonitored ones through masked pairwise similarities. Extensive experiments with two big regions in a provincial capital city in China verify the effectiveness of our approach.