Youfang Lin

LG
h-index27
53papers
1,233citations
Novelty50%
AI Score58

53 Papers

DBJun 19, 2023Code
LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry

Lixia Wu, Haomin Wen, Haoyuan Hu et al.

Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.

LGJun 7, 2023Code
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Peng Cheng, Xianyuan Zhan, Zhihao Wu et al.

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.Code is available at: https://github.com/pcheng2/TSRL

CVAug 9, 2024Code
TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability

Yan Lin, Tonglong Wei, Zeyu Zhou et al.

Vehicle trajectories provide valuable movement information that supports various downstream tasks and powers real-world applications. A desirable trajectory learning model should transfer between different regions and tasks without retraining, thus improving computational efficiency and effectiveness with limited training data. However, a model's ability to transfer across regions is limited by the unique spatial features and POI arrangements of each region, which are closely linked to vehicle movement patterns and difficult to generalize. Additionally, achieving task transferability is challenging due to the differing generation schemes required for various tasks. Existing efforts towards transferability primarily involve learning embedding vectors for trajectories, which perform poorly in region transfer and still require retraining of prediction modules for task transfer. To address these challenges, we propose TrajFM, a vehicle trajectory foundation model that excels in both region and task transferability. For region transferability, we introduce STRFormer as the main learnable model within TrajFM. It integrates spatial, temporal, and POI modalities of trajectories to effectively manage variations in POI arrangements across regions and includes a learnable spatio-temporal Rotary position embedding module for handling spatial features. For task transferability, we propose a trajectory masking and recovery scheme. This scheme unifies the generation processes of various tasks into the masking and recovery of modalities and sub-trajectories, allowing TrajFM to be pre-trained once and transferred to different tasks without retraining. Experiments on two real-world vehicle trajectory datasets under various settings demonstrate the effectiveness of TrajFM. Code is available at https://anonymous.4open.science/r/TrajFM-30E4.

LGJul 12, 2024Code
STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

YiHeng Huang, Xiaowei Mao, Shengnan Guo et al.

Spatial-temporal forecasting and imputation are important for real-world intelligent systems. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While pre-trained language model (PLM) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their applications in spatial-temporal data understanding has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-PLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{PLM}, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-PLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers. Topology-aware node embeddings are designed for PLM to comprehend and exploit the topology structure of data in inductive manner. Furthermore, to mitigate the efficiency issues introduced by the PLM, we design a sandglass attention module (SGA) combined with a specific constrained loss function, which significantly improves the model's efficiency while ensuring performance. Extensive experiments demonstrate that STD-PLM exhibits competitive performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-PLM achieves promising results on both few-shot and zero-shot tasks. The code is made available at \href{https://github.com/Hyheng/STD-PLM}{https://github.com/Hyheng/STD-PLM}

LGSep 26, 2024Code
PGN: The RNN's New Successor is Effective for Long-Range Time Series Forecasting

Yuxin Jia, Youfang Lin, Jing Yu et al.

Due to the recurrent structure of RNN, the long information propagation path poses limitations in capturing long-term dependencies, gradient explosion/vanishing issues, and inefficient sequential execution. Based on this, we propose a novel paradigm called Parallel Gated Network (PGN) as the new successor to RNN. PGN directly captures information from previous time steps through the designed Historical Information Extraction (HIE) layer and leverages gated mechanisms to select and fuse it with the current time step information. This reduces the information propagation path to $\mathcal{O}(1)$, effectively addressing the limitations of RNN. To enhance PGN's performance in long-range time series forecasting tasks, we propose a novel temporal modeling framework called Temporal PGN (TPGN). TPGN incorporates two branches to comprehensively capture the semantic information of time series. One branch utilizes PGN to capture long-term periodic patterns while preserving their local characteristics. The other branch employs patches to capture short-term information and aggregate the global representation of the series. TPGN achieves a theoretical complexity of $\mathcal{O}(\sqrt{L})$, ensuring efficiency in its operations. Experimental results on five benchmark datasets demonstrate the state-of-the-art (SOTA) performance and high efficiency of TPGN, further confirming the effectiveness of PGN as the new successor to RNN in long-range time series forecasting. The code is available in this repository: \url{https://github.com/Water2sea/TPGN}.

LGJul 6, 2023
Origin-Destination Travel Time Oracle for Map-based Services

Yan Lin, Huaiyu Wan, Jilin Hu et al.

Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.

LGAug 9, 2024Code
PTrajM: Efficient and Semantic-rich Trajectory Learning with Pretrained Trajectory-Mamba

Yan Lin, Yichen Liu, Zeyu Zhou et al.

Vehicle trajectories provide crucial movement information for various real-world applications. To better utilize vehicle trajectories, it is essential to develop a trajectory learning approach that can effectively and efficiently extract rich semantic information, including movement behavior and travel purposes, to support accurate downstream applications. However, creating such an approach presents two significant challenges. First, movement behavior are inherently spatio-temporally continuous, making them difficult to extract efficiently from irregular and discrete trajectory points. Second, travel purposes are related to the functionalities of areas and road segments traversed by vehicles. These functionalities are not available from the raw spatio-temporal trajectory features and are hard to extract directly from complex textual features associated with these areas and road segments. To address these challenges, we propose PTrajM, a novel method capable of efficient and semantic-rich vehicle trajectory learning. To support efficient modeling of movement behavior, we introduce Trajectory-Mamba as the learnable model of PTrajM, which effectively extracts continuous movement behavior while being more computationally efficient than existing structures. To facilitate efficient extraction of travel purposes, we propose a travel purpose-aware pre-training procedure, which enables PTrajM to discern the travel purposes of trajectories without additional computational resources during its embedding process. Extensive experiments on two real-world datasets and comparisons with several state-of-the-art trajectory learning methods demonstrate the effectiveness of PTrajM. Code is available at https://anonymous.4open.science/r/PTrajM-C973.

LGJul 17, 2024Code
UniTE: A Survey and Unified Pipeline for Pre-training Spatiotemporal Trajectory Embeddings

Yan Lin, Zeyu Zhou, Yicheng Liu et al.

Spatiotemporal trajectories are sequences of timestamped locations, which enable a variety of analyses that in turn enable important real-world applications. It is common to map trajectories to vectors, called embeddings, before subsequent analyses. Thus, the qualities of embeddings are very important. Methods for pre-training embeddings, which leverage unlabeled trajectories for training universal embeddings, have shown promising applicability across different tasks, thus attracting considerable interest. However, research progress on this topic faces two key challenges: a lack of a comprehensive overview of existing methods, resulting in several related methods not being well-recognized, and the absence of a unified pipeline, complicating the development of new methods and the analysis of methods. We present UniTE, a survey and a unified pipeline for this domain. In doing so, we present a comprehensive list of existing methods for pre-training trajectory embeddings, which includes methods that either explicitly or implicitly employ pre-training techniques. Further, we present a unified and modular pipeline with publicly available underlying code, simplifying the process of constructing and evaluating methods for pre-training trajectory embeddings. Additionally, we contribute a selection of experimental results using the proposed pipeline on real-world datasets. Implementation of the pipeline is publicly available at https://github.com/Logan-Lin/UniTE.

LGJan 31, 2023
DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models

Haomin Wen, Youfang Lin, Yutong Xia et al.

Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecasting. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decision-making. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular denoising diffusion probabilistic models to STGs, leading to a novel non-autoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatio-temporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive experiments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4%-14%, and Root Mean Squared Error (RMSE) by 2%-7% over existing methods on three real-world datasets.

CLOct 20, 2023
Towards Enhancing Relational Rules for Knowledge Graph Link Prediction

Shuhan Wu, Huaiyu Wan, Wei Chen et al. · pku

Graph neural networks (GNNs) have shown promising performance for knowledge graph reasoning. A recent variant of GNN called progressive relational graph neural network (PRGNN), utilizes relational rules to infer missing knowledge in relational digraphs and achieves notable results. However, during reasoning with PRGNN, two important properties are often overlooked: (1) the sequentiality of relation composition, where the order of combining different relations affects the semantics of the relational rules, and (2) the lagged entity information propagation, where the transmission speed of required information lags behind the appearance speed of new entities. Ignoring these properties leads to incorrect relational rule learning and decreased reasoning accuracy. To address these issues, we propose a novel knowledge graph reasoning approach, the Relational rUle eNhanced Graph Neural Network (RUN-GNN). Specifically, RUN-GNN employs a query related fusion gate unit to model the sequentiality of relation composition and utilizes a buffering update mechanism to alleviate the negative effect of lagged entity information propagation, resulting in higher-quality relational rule learning. Experimental results on multiple datasets demonstrate the superiority of RUN-GNN is superior on both transductive and inductive link prediction tasks.

CVJul 29, 2022
Pre-training General Trajectory Embeddings with Maximum Multi-view Entropy Coding

Yan Lin, Huaiyu Wan, Shengnan Guo et al.

Spatio-temporal trajectories provide valuable information about movement and travel behavior, enabling various downstream tasks that in turn power real-world applications. Learning trajectory embeddings can improve task performance but may incur high computational costs and face limited training data availability. Pre-training learns generic embeddings by means of specially constructed pretext tasks that enable learning from unlabeled data. Existing pre-training methods face (i) difficulties in learning general embeddings due to biases towards certain downstream tasks incurred by the pretext tasks, (ii) limitations in capturing both travel semantics and spatio-temporal correlations, and (iii) the complexity of long, irregularly sampled trajectories. To tackle these challenges, we propose Maximum Multi-view Trajectory Entropy Coding (MMTEC) for learning general and comprehensive trajectory embeddings. We introduce a pretext task that reduces biases in pre-trained trajectory embeddings, yielding embeddings that are useful for a wide variety of downstream tasks. We also propose an attention-based discrete encoder and a NeuralCDE-based continuous encoder that extract and represent travel behavior and continuous spatio-temporal correlations from trajectories in embeddings, respectively. Extensive experiments on two real-world datasets and three downstream tasks offer insight into the design properties of our proposal and indicate that it is capable of outperforming existing trajectory embedding methods.

LGJul 30, 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

Xiaowei Mao, Haomin Wen, Hengrui Zhang et al.

Pick-up and Delivery Route Prediction (PDRP), which aims to estimate the future service route of a worker given his current task pool, has received rising attention in recent years. Deep neural networks based on supervised learning have emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data. Though promising, they fail to introduce the non-differentiable test criteria into the training process, leading to a mismatch in training and test criteria. Which considerably trims down their performance when applied in practical systems. To tackle the above issue, we present the first attempt to generalize Reinforcement Learning (RL) to the route prediction task, leading to a novel RL-based framework called DRL4Route. It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning. DRL4Route can serve as a plug-and-play component to boost the existing deep learning models. Based on the framework, we further implement a model named DRL4Route-GAE for PDRP in logistic service. It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates, thus achieving a more optimal policy. Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation (LSD) by 0.9%-2.7%, and Accuracy@3 (ACC@3) by 2.4%-3.2% over existing methods on the real-world dataset.

56.5IRApr 21Code
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation

Shuyuan Zhao, Wei Chen, Weijie Zhang et al.

Temporal Knowledge Graph (TKG) extrapolation aims to predict future events based on historical facts. Recent studies have attempted to enhance TKG extrapolation by integrating TKG's evolving structural representations and textual event chains into Large Language Models (LLMs). Yet, two main challenges limit these approaches: (1) The loss of essential spatial-temporal information due to shallow alignment between TKG's graph evolving structural representation and the LLM's semantic space, and (2) the progressive dilution of the TKG's evolving structural features during LLM fine-tuning. To address these challenges, we propose the Spatial-Temporal Knowledge Adapter (STK-Adapter), which flexibly integrates the evolving graph encoder and the LLM to facilitate TKG reasoning. In STK-Adapter, a Spatial-Temporal MoE is designed to capture spatial structures and temporal patterns inherent in TKGs. An Event-Aware MoE is employed to model intricate temporal semantics dependencies within event chains. In addition, a Cross-Modality Alignment MoE is proposed to facilitate deep cross-modality alignment by TKG-guided attention experts. Extensive experiments on benchmark datasets demonstrate that STK-Adapter significantly outperforms state-of-the-art methods and exhibits strong generalization capabilities in cross-dataset task. The code is available at https://github.com/Zhaoshuyuan0246/STK-Adapter.

CVMay 26, 2022
Light Field Raindrop Removal via 4D Re-sampling

Dong Jing, Shuo Zhang, Song Chang et al.

The Light Field Raindrop Removal (LFRR) aims to restore the background areas obscured by raindrops in the Light Field (LF). Compared with single image, the LF provides more abundant information by regularly and densely sampling the scene. Since raindrops have larger disparities than the background in the LF, the majority of texture details occluded by raindrops are visible in other views. In this paper, we propose a novel LFRR network by directly utilizing the complementary pixel information of raindrop-free areas in the input raindrop LF, which consists of the re-sampling module and the refinement module. Specifically, the re-sampling module generates a new LF which is less polluted by raindrops through re-sampling position predictions and the proposed 4D interpolation. The refinement module improves the restoration of the completely occluded background areas and corrects the pixel error caused by 4D interpolation. Furthermore, we carefully build the first real scene LFRR dataset for model training and validation. Experiments demonstrate that the proposed method can effectively remove raindrops and achieves state-of-the-art performance in both background restoration and view consistency maintenance.

CVMar 13, 2023
View Adaptive Light Field Deblurring Networks with Depth Perception

Zeqi Shen, Shuo Zhang, Zhuhao Zhang et al.

The Light Field (LF) deblurring task is a challenging problem as the blur images are caused by different reasons like the camera shake and the object motion. The single image deblurring method is a possible way to solve this problem. However, since it deals with each view independently and cannot effectively utilize and maintain the LF structure, the restoration effect is usually not ideal. Besides, the LF blur is more complex because the degree is affected by the views and depth. Therefore, we carefully designed a novel LF deblurring network based on the LF blur characteristics. On one hand, since the blur degree varies a lot in different views, we design a novel view adaptive spatial convolution to deblur blurred LFs, which calculates the exclusive convolution kernel for each view. On the other hand, because the blur degree also varies with the depth of the object, a depth perception view attention is designed to deblur different depth areas by selectively integrating information from different views. Besides, we introduce an angular position embedding to maintain the LF structure better, which ensures the model correctly restores the view information. Quantitative and qualitative experimental results on synthetic and real images show that the deblurring effect of our method is better than other state-of-the-art methods.

AIAug 23, 2024
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation

Xiaowei Mao, Yan Lin, Shengnan Guo et al.

Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground truth, and 2) modeling the impact of travel time in each segment on overall uncertainty under varying conditions. We propose DutyTTE to address these challenges. For the first challenge, we introduce a deep reinforcement learning method to improve alignment between the predicted path and the ground truth, providing more accurate travel time information from road segments to improve TTE. For the second challenge, we propose a mixture of experts guided uncertainty quantification mechanism to better capture travel time uncertainty for each segment under varying contexts. Additionally, we calibrate our results using Hoeffding's upper-confidence bound to provide statistical guarantees for the estimated confidence intervals. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed method.

LGJul 22, 2024
Spatial-Temporal Cross-View Contrastive Pre-training for Check-in Sequence Representation Learning

Letian Gong, Huaiyu Wan, Shengnan Guo et al.

The rapid growth of location-based services (LBS) has yielded massive amounts of data on human mobility. Effectively extracting meaningful representations for user-generated check-in sequences is pivotal for facilitating various downstream services. However, the user-generated check-in data are simultaneously influenced by the surrounding objective circumstances and the user's subjective intention. Specifically, the temporal uncertainty and spatial diversity exhibited in check-in data make it difficult to capture the macroscopic spatial-temporal patterns of users and to understand the semantics of user mobility activities. Furthermore, the distinct characteristics of the temporal and spatial information in check-in sequences call for an effective fusion method to incorporate these two types of information. In this paper, we propose a novel Spatial-Temporal Cross-view Contrastive Representation (STCCR) framework for check-in sequence representation learning. Specifically, STCCR addresses the above challenges by employing self-supervision from "spatial topic" and "temporal intention" views, facilitating effective fusion of spatial and temporal information at the semantic level. Besides, STCCR leverages contrastive clustering to uncover users' shared spatial topics from diverse mobility activities, while employing angular momentum contrast to mitigate the impact of temporal uncertainty and noise. We extensively evaluate STCCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.

LGDec 25, 2025
RIPCN: A Road Impedance Principal Component Network for Probabilistic Traffic Flow Forecasting

Haochen Lv, Yan Lin, Shengnan Guo et al.

Accurate traffic flow forecasting is crucial for intelligent transportation services such as navigation and ride-hailing. In such applications, uncertainty estimation in forecasting is important because it helps evaluate traffic risk levels, assess forecast reliability, and provide timely warnings. As a result, probabilistic traffic flow forecasting (PTFF) has gained significant attention, as it produces both point forecasts and uncertainty estimates. However, existing PTFF approaches still face two key challenges: (1) how to uncover and model the causes of traffic flow uncertainty for reliable forecasting, and (2) how to capture the spatiotemporal correlations of uncertainty for accurate prediction. To address these challenges, we propose RIPCN, a Road Impedance Principal Component Network that integrates domain-specific transportation theory with spatiotemporal principal component learning for PTFF. RIPCN introduces a dynamic impedance evolution network that captures directional traffic transfer patterns driven by road congestion level and flow variability, revealing the direct causes of uncertainty and enhancing both reliability and interpretability. In addition, a principal component network is designed to forecast the dominant eigenvectors of future flow covariance, enabling the model to capture spatiotemporal uncertainty correlations. This design allows for accurate and efficient uncertainty estimation while also improving point prediction performance. Experimental results on real-world datasets show that our approach outperforms existing probabilistic forecasting methods.

SYOct 30, 2024Code
DiffLight: A Partial Rewards Conditioned Diffusion Model for Traffic Signal Control with Missing Data

Hanyang Chen, Yang Jiang, Shengnan Guo et al.

The application of reinforcement learning in traffic signal control (TSC) has been extensively researched and yielded notable achievements. However, most existing works for TSC assume that traffic data from all surrounding intersections is fully and continuously available through sensors. In real-world applications, this assumption often fails due to sensor malfunctions or data loss, making TSC with missing data a critical challenge. To meet the needs of practical applications, we introduce DiffLight, a novel conditional diffusion model for TSC under data-missing scenarios in the offline setting. Specifically, we integrate two essential sub-tasks, i.e., traffic data imputation and decision-making, by leveraging a Partial Rewards Conditioned Diffusion (PRCD) model to prevent missing rewards from interfering with the learning process. Meanwhile, to effectively capture the spatial-temporal dependencies among intersections, we design a Spatial-Temporal transFormer (STFormer) architecture. In addition, we propose a Diffusion Communication Mechanism (DCM) to promote better communication and control performance under data-missing scenarios. Extensive experiments on five datasets with various data-missing scenarios demonstrate that DiffLight is an effective controller to address TSC with missing data. The code of DiffLight is released at https://github.com/lokol5579/DiffLight-release.

AIDec 21, 2024Code
CognTKE: A Cognitive Temporal Knowledge Extrapolation Framework

Wei Chen, Yuting Wu, Shuhan Wu et al.

Reasoning future unknowable facts on temporal knowledge graphs (TKGs) is a challenging task, holding significant academic and practical values for various fields. Existing studies exploring explainable reasoning concentrate on modeling comprehensible temporal paths relevant to the query. Yet, these path-based methods primarily focus on local temporal paths appearing in recent times, failing to capture the complex temporal paths in TKG and resulting in the loss of longer historical relations related to the query. Motivated by the Dual Process Theory in cognitive science, we propose a \textbf{Cogn}itive \textbf{T}emporal \textbf{K}nowledge \textbf{E}xtrapolation framework (CognTKE), which introduces a novel temporal cognitive relation directed graph (TCR-Digraph) and performs interpretable global shallow reasoning and local deep reasoning over the TCR-Digraph. Specifically, the proposed TCR-Digraph is constituted by retrieving significant local and global historical temporal relation paths associated with the query. In addition, CognTKE presents the global shallow reasoner and the local deep reasoner to perform global one-hop temporal relation reasoning (System 1) and local complex multi-hop path reasoning (System 2) over the TCR-Digraph, respectively. The experimental results on four benchmark datasets demonstrate that CognTKE achieves significant improvement in accuracy compared to the state-of-the-art baselines and delivers excellent zero-shot reasoning ability. \textit{The code is available at https://github.com/WeiChen3690/CognTKE}.

AIApr 5, 2025Code
Towards An Efficient and Effective En Route Travel Time Estimation Framework

Zekai Shen, Haitao Yuan, Xiaowei Mao et al.

En route travel time estimation (ER-TTE) focuses on predicting the travel time of the remaining route. Existing ER-TTE methods always make re-estimation which significantly hinders real-time performance, especially when faced with the computational demands of simultaneous user requests. This results in delays and reduced responsiveness in ER-TTE services. We propose a general efficient framework U-ERTTE combining an Uncertainty-Guided Decision mechanism (UGD) and Fine-Tuning with Meta-Learning (FTML) to address these challenges. UGD quantifies the uncertainty and provides confidence intervals for the entire route. It selectively re-estimates only when the actual travel time deviates from the predicted confidence intervals, thereby optimizing the efficiency of ER-TTE. To ensure the accuracy of confidence intervals and accurate predictions that need to re-estimate, FTML is employed to train the model, enabling it to learn general driving patterns and specific features to adapt to specific tasks. Extensive experiments on two large-scale real datasets demonstrate that the U-ERTTE framework significantly enhances inference speed and throughput while maintaining high effectiveness. Our code is available at https://github.com/shenzekai/U-ERTTE

MAMar 2, 2023
GHQ: Grouped Hybrid Q Learning for Heterogeneous Cooperative Multi-agent Reinforcement Learning

Xiaoyang Yu, Youfang Lin, Xiangsen Wang et al.

Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in homogeneous scenarios. However, heterogeneous scenarios are also very common and usually harder to solve. In this paper, we mainly discuss cooperative heterogeneous MARL problems in Starcraft Multi-Agent Challenges (SMAC) environment. We firstly define and describe the heterogeneous problems in SMAC. In order to comprehensively reveal and study the problem, we make new maps added to the original SMAC maps. We find that baseline algorithms fail to perform well in those heterogeneous maps. To address this issue, we propose the Grouped Individual-Global-Max Consistency (GIGM) and a novel MARL algorithm, Grouped Hybrid Q Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group, along with a novel hybrid structure for factorization. To enhance coordination between groups, we maximize the Inter-group Mutual Information (IGMI) between groups' trajectories. Experiments on original and new heterogeneous maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.

AIDec 6, 2024Code
DRL4AOI: A DRL Framework for Semantic-aware AOI Segmentation in Location-Based Services

Youfang Lin, Jinji Fu, Haomin Wen et al.

In Location-Based Services (LBS), such as food delivery, a fundamental task is segmenting Areas of Interest (AOIs), aiming at partitioning the urban geographical spaces into non-overlapping regions. Traditional AOI segmentation algorithms primarily rely on road networks to partition urban areas. While promising in modeling the geo-semantics, road network-based models overlooked the service-semantic goals (e.g., workload equality) in LBS service. In this paper, we point out that the AOI segmentation problem can be naturally formulated as a Markov Decision Process (MDP), which gradually chooses a nearby AOI for each grid in the current AOI's border. Based on the MDP, we present the first attempt to generalize Deep Reinforcement Learning (DRL) for AOI segmentation, leading to a novel DRL-based framework called DRL4AOI. The DRL4AOI framework introduces different service-semantic goals in a flexible way by treating them as rewards that guide the AOI generation. To evaluate the effectiveness of DRL4AOI, we develop and release an AOI segmentation system. We also present a representative implementation of DRL4AOI - TrajRL4AOI - for AOI segmentation in the logistics service. It introduces a Double Deep Q-learning Network (DDQN) to gradually optimize the AOI generation for two specific semantic goals: i) trajectory modularity, i.e., maximize tightness of the trajectory connections within an AOI and the sparsity of connections between AOIs, ii) matchness with the road network, i.e., maximizing the matchness between AOIs and the road network. Quantitative and qualitative experiments conducted on synthetic and real-world data demonstrate the effectiveness and superiority of our method. The code and system is publicly available at https://github.com/Kogler7/AoiOpt.

54.2SEMar 25
Boosting LLMs for Mutation Generation

Bo Wang, Ming Deng, Mingda Chen et al.

LLM-based mutation testing is a promising testing technology, but existing approaches typically rely on a fixed set of mutations as few-shot examples or none at all. This can result in generic low-quality mutations, missed context-specific mutation patterns, substantial numbers of redundant and uncompilable mutants, and limited semantic similarity to real bugs. To overcome these limitations, we introduce SMART (Semantic Mutation with Adaptive Retrieval and Tuning). SMART integrates retrieval-augmented generation (RAG) on a vectorized dataset of real-world bugs, focused code chunking, and supervised fine-tuning using mutations coupled with real-world bugs. We conducted an extensive empirical study of SMART using 1,991 real-world Java bugs from the Defects4J and ConDefects datasets, comparing SMART to the state-of-the-art LLM-based approaches, LLMut and LLMorpheus. The results reveal that SMART substantially improves mutation validity, effectiveness, and efficiency (even enabling small-scale 7B-scale models to match or even surpass large models like GPT-4o). We also demonstrate that SMART significantly improves downstream software engineering applications, including test case prioritization and fault localization. More specifically, SMART improves validity (weighted average generation rate) from 42.89% to 65.6%. It raises the non-duplicate rate from 87.38% to 95.62%, and the compilable rate from 88.85% to 90.21%. In terms of effectiveness, it achieves a real bug detection rate of 92.61% (vs. 57.86% for LLMut) and improves the average Ochiai coefficient from 25.61% to 38.44%. For fault localization, SMART ranks 64 more bugs as Top-1 under MUSE and 57 more under Metallaxis.

MAAug 14, 2024
Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

Xiaoyang Yu, Youfang Lin, Shuo Wang et al.

In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.

LGFeb 12, 2024
Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory Generation

Tonglong Wei, Youfang Lin, Shengnan Guo et al.

Trajectory data is essential for various applications as it records the movement of vehicles. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory data mining and trajectory-based applications. To address this issue, some methods for generating synthetic trajectories have been proposed to expand the scale of the dataset. However, all existing methods generate trajectories in the geographical coordinate system, which poses two limitations for their utilization in practical applications: 1) the inability to ensure that the generated trajectories are constrained on the road. 2) the lack of road-related information. In this paper, we propose a new problem to meet the practical application need, \emph{i.e.}, road network-constrained trajectory (RNTraj) generation, which can directly generate trajectories on the road network with road-related information. RNTraj is a hybrid type of data, in which each point is represented by a discrete road segment and a continuous moving rate. To generate RNTraj, we design a diffusion model called Diff-RNTraj. This model can effectively handle the hybrid RNTraj using a continuous diffusion framework by incorporating a pre-training strategy to embed hybrid RNTraj into continuous representations. During the sampling stage, a RNTraj decoder is designed to map the continuous representation generated by the diffusion model back to the hybrid RNTraj format. Furthermore, Diff-RNTraj introduces a novel loss function to enhance the spatial validity of the generated trajectories. Extensive experiments conducted on two real-world trajectory datasets demonstrate the effectiveness of the proposed model.

LGOct 29, 2024
Mobility-LLM: Learning Visiting Intentions and Travel Preferences from Human Mobility Data with Large Language Models

Letian Gong, Yan Lin, Xinyue Zhang et al.

Location-based services (LBS) have accumulated extensive human mobility data on diverse behaviors through check-in sequences. These sequences offer valuable insights into users' intentions and preferences. Yet, existing models analyzing check-in sequences fail to consider the semantics contained in these sequences, which closely reflect human visiting intentions and travel preferences, leading to an incomplete comprehension. Drawing inspiration from the exceptional semantic understanding and contextual information processing capabilities of large language models (LLMs) across various domains, we present Mobility-LLM, a novel framework that leverages LLMs to analyze check-in sequences for multiple tasks. Since LLMs cannot directly interpret check-ins, we reprogram these sequences to help LLMs comprehensively understand the semantics of human visiting intentions and travel preferences. Specifically, we introduce a visiting intention memory network (VIMN) to capture the visiting intentions at each record, along with a shared pool of human travel preference prompts (HTPP) to guide the LLM in understanding users' travel preferences. These components enhance the model's ability to extract and leverage semantic information from human mobility data effectively. Extensive experiments on four benchmark datasets and three downstream tasks demonstrate that our approach significantly outperforms existing models, underscoring the effectiveness of Mobility-LLM in advancing our understanding of human mobility data within LBS contexts.

AIDec 4, 2023
Local-Global History-aware Contrastive Learning for Temporal Knowledge Graph Reasoning

Wei Chen, Huaiyu Wan, Yuting Wu et al. · pku

Temporal knowledge graphs (TKGs) have been identified as a promising approach to represent the dynamics of facts along the timeline. The extrapolation of TKG is to predict unknowable facts happening in the future, holding significant practical value across diverse fields. Most extrapolation studies in TKGs focus on modeling global historical fact repeating and cyclic patterns, as well as local historical adjacent fact evolution patterns, showing promising performance in predicting future unknown facts. Yet, existing methods still face two major challenges: (1) They usually neglect the importance of historical information in KG snapshots related to the queries when encoding the local and global historical information; (2) They exhibit weak anti-noise capabilities, which hinders their performance when the inputs are contaminated with noise.To this end, we propose a novel \blue{Lo}cal-\blue{g}lobal history-aware \blue{C}ontrastive \blue{L}earning model (\blue{LogCL}) for TKG reasoning, which adopts contrastive learning to better guide the fusion of local and global historical information and enhance the ability to resist interference. Specifically, for the first challenge, LogCL proposes an entity-aware attention mechanism applied to the local and global historical facts encoder, which captures the key historical information related to queries. For the latter issue, LogCL designs four historical query contrast patterns, effectively improving the robustness of the model. The experimental results on four benchmark datasets demonstrate that LogCL delivers better and more robust performance than the state-of-the-art baselines.

LGApr 29, 2024
Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder for Map-Constrained Trajectory Recovery

Tonglong Wei, Youfang Lin, Yan Lin et al.

Recovering intermediate missing GPS points in a sparse trajectory, while adhering to the constraints of the road network, could offer deep insights into users' moving behaviors in intelligent transportation systems. Although recent studies have demonstrated the advantages of achieving map-constrained trajectory recovery via an end-to-end manner, they still face two significant challenges. Firstly, existing methods are mostly sequence-based models. It is extremely hard for them to comprehensively capture the micro-semantics of individual trajectory, including the information of each GPS point and the movement between two GPS points. Secondly, existing approaches ignore the impact of the macro-semantics, i.e., the road conditions and the people's shared travel preferences reflected by a group of trajectories. To address the above challenges, we propose a Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder (MM-STGED). Specifically, we model each trajectory as a graph to efficiently describe the micro-semantics of trajectory and design a novel message-passing mechanism to learn trajectory representations. Additionally, we extract the macro-semantics of trajectories and further incorporate them into a well-designed graph-based decoder to guide trajectory recovery. Extensive experiments conducted on sparse trajectories with three different sampling intervals that are respectively constructed from two real-world trajectory datasets demonstrate the superiority of our proposed model.

CVDec 6, 2023
Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Xiaobo Hu, Youfang Lin, HeHe Fan et al.

Given an object of interest, visual navigation aims to reach the object's location based on a sequence of partial observations. To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment. In this paper, we propose a Category Relation Graph (CRG) to learn the knowledge of object category layout relations and a Temporal-Spatial-Region (TSR) attention architecture to perceive the long-term spatial-temporal dependencies of objects helping the navigation. We learn prior knowledge of object layout, establishing a category relationship graph to deduce the positions of specific objects. Subsequently, we introduced TSR to capture the relationships of objects in temporal, spatial, and regions within the observation trajectories. Specifically, we propose a Temporal attention module (T) to model the temporal structure of the observation sequence, which implicitly encodes the historical moving or trajectory information. Then, a Spatial attention module (S) is used to uncover the spatial context of the current observation objects based on the category relation graph and past observations. Last, a Region attention module (R) shifts the attention to the target-relevant region. Based on the visual representation extracted by our method, the agent can better perceive the environment and easily learn superior navigation policy. Experiments on AI2-THOR demonstrate our CRG-TSR method significantly outperforms existing methods regarding both effectiveness and efficiency. The code has been included in the supplementary material and will be publicly available.

52.4CVApr 26
Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference

Xiaowei Mao, Bowen Sui, Weijie Zhang et al.

Expressway video anomaly detection is essential for safety management. However, identifying anomalies across diverse scenes remains challenging, particularly for far-field targets exhibiting subtle abnormal vehicle motions. While Vision-Language Models (VLMs) demonstrate strong semantic reasoning capabilities, processing global frames causes attention dilution for these far-field objects and incurs prohibitive computational costs. To address these issues, we propose VIBES, an asynchronous collaborative framework utilizing VLMs guided by Bayesian inference. Specifically, to overcome poor generalization across varying expressway environments, we introduce an online Bayesian inference module. This module continuously evaluates vehicle trajectories to dynamically update the probabilistic boundaries of normal driving behaviors, serving as an asynchronous trigger to precisely localize anomalies in space and time. Instead of processing the continuous video stream, the VLM processes only the localized visual regions indicated by the trigger. This targeted visual input prevents attention dilution and enables accurate semantic reasoning. Extensive evaluations demonstrate that VIBES improves detection accuracy for far-field anomalies and reduces computational overhead, achieving high real-time efficiency and explainability while demonstrating generalization across diverse expressway conditions.

MAJan 9, 2025
CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

Shoucheng Song, Youfang Lin, Sheng Han et al.

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.

LGDec 19, 2023
Time-Series Contrastive Learning against False Negatives and Class Imbalance

Xiyuan Jin, Jing Wang, Lei Liu et al.

As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.

CVDec 4, 2023
Learning Robust Representations via Bidirectional Transition for Visual Reinforcement Learning

Xiaobo Hu, Youfang Lin, Yue Liu et al.

Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations. However, extracting reliable and generalizable representations from vision-based observations remains a central challenge. Inspired by the human thought process, when the representation extracted from the observation can predict the future and trace history, the representation is reliable and accurate in comprehending the environment. Based on this concept, we introduce a Bidirectional Transition (BiT) model, which leverages the ability to bidirectionally predict environmental transitions both forward and backward to extract reliable representations. Our model demonstrates competitive generalization performance and sample efficiency on two settings of the DeepMind Control suite. Additionally, we utilize robotic manipulation and CARLA simulators to demonstrate the wide applicability of our method.

LGMay 21, 2024
TrajCogn: Leveraging LLMs for Cognizing Movement Patterns and Travel Purposes from Trajectories

Zeyu Zhou, Yan Lin, Haomin Wen et al.

Spatio-temporal trajectories are crucial in various data mining tasks. It is important to develop a versatile trajectory learning method that performs different tasks with high accuracy. This involves effectively extracting two core aspects of information--movement patterns and travel purposes--from trajectories. However, this is challenging due to limitations in model capacity and the quality and scale of trajectory datasets. Meanwhile, large language models (LLMs) have shown great success in versatility by training on large-scale, high-quality datasets. Given the similarities between trajectories and sentences, there's potential to leverage LLMs to develop an effective trajectory learning method. However, standard LLMs are not designed to handle the unique spatio-temporal features of trajectories and cannot extract movement patterns and travel purposes. To address these challenges, we propose a model called TrajCogn that effectively utilizes LLMs to model trajectories. TrajCogn leverages the strengths of LLMs to create a versatile trajectory learning approach while addressing the limitations of standard LLMs. First, TrajCogn incorporates a novel trajectory semantic embedder that enables LLMs to process spatio-temporal features and extract movement patterns and travel purposes. Second, TrajCogn introduces a new trajectory prompt that integrates these patterns and purposes into LLMs, allowing the model to adapt to various tasks. Extensive experiments on two real-world datasets and two representative tasks demonstrate that TrajCogn successfully achieves its design goals. Codes are available at https://anonymous.4open.science/r/TrajCogn-5021.

35.1LGMar 31
DiSGMM: A Method for Time-varying Microscopic Weight Completion on Road Networks

Yan Lin, Jilin Hu, Shengnan Guo et al.

Microscopic road-network weights represent fine-grained, time-varying traffic conditions obtained from individual vehicles. An example is travel speeds associated with road segments as vehicles traverse them. These weights support tasks including traffic microsimulation and vehicle routing with reliability guarantees. We study the problem of time-varying microscopic weight completion. During a time slot, the available weights typically cover only some road segments. Weight completion recovers distributions for the weights of every road segment at the current time slot. This problem involves two challenges: (i) contending with two layers of sparsity, where weights are missing at both the network layer (many road segments lack weights) and the segment layer (a segment may have insufficient weights to enable accurate distribution estimation); and (ii) achieving a weight distribution representation that is closed-form and can capture complex conditions flexibly, including heavy tails and multiple clusters. To address these challenges, we propose DiSGMM that combines sparsity-aware embeddings with spatiotemporal modeling to leverage sparse known weights alongside learned segment properties and long-range correlations for distribution estimation. DiSGMM represents distributions of microscopic weights as learnable Gaussian mixture models, providing closed-form distributions capable of capturing complex conditions flexibly. Experiments on two real-world datasets show that DiSGMM can outperform state-of-the-art methods.

CVFeb 1
DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification

Ying Shu, Pujian Zhan, Huiqi Yang et al.

Both fine-grained discriminative details and global semantic features can contribute to solving person re-identification challenges, such as occlusion and pose variations. Vision foundation models (\textit{e.g.}, DINO) excel at mining local textures, and vision-language models (\textit{e.g.}, CLIP) capture strong global semantic difference. Existing methods predominantly rely on a single paradigm, neglecting the potential benefits of their integration. In this paper, we analyze the complementary roles of these two architectures and propose a framework to synergize their strengths by a \textbf{D}ual-\textbf{R}egularized Bidirectional \textbf{Transformer} (\textbf{DRFormer}). The dual-regularization mechanism ensures diverse feature extraction and achieves a better balance in the contributions of the two models. Extensive experiments on five benchmarks show that our method effectively harmonizes local and global representations, achieving competitive performance against state-of-the-art methods.

LGOct 20, 2025
TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model

Yichen Liu, Yan Lin, Shengnan Guo et al.

Vehicle GPS trajectories record how vehicles move over time, storing valuable travel semantics, including movement patterns and travel purposes. Learning travel semantics effectively and efficiently is crucial for real-world applications of trajectory data, which is hindered by two major challenges. First, travel purposes are tied to the functions of the roads and points-of-interest (POIs) involved in a trip. Such information is encoded in textual addresses and descriptions and introduces heavy computational burden to modeling. Second, real-world trajectories often contain redundant points, which harm both computational efficiency and trajectory embedding quality. To address these challenges, we propose TrajMamba, a novel approach for efficient and semantically rich vehicle trajectory learning. TrajMamba introduces a Traj-Mamba Encoder that captures movement patterns by jointly modeling both GPS and road perspectives of trajectories, enabling robust representations of continuous travel behaviors. It also incorporates a Travel Purpose-aware Pre-training procedure to integrate travel purposes into the learned embeddings without introducing extra overhead to embedding calculation. To reduce redundancy in trajectories, TrajMamba features a Knowledge Distillation Pre-training scheme to identify key trajectory points through a learnable mask generator and obtain effective compressed trajectory embeddings. Extensive experiments on two real-world datasets and three downstream tasks show that TrajMamba outperforms state-of-the-art baselines in both efficiency and accuracy.

AIJul 14, 2025
Improving monotonic optimization in heterogeneous multi-agent reinforcement learning with optimal marginal deterministic policy gradient

Xiaoyang Yu, Youfang Lin, Shuo Wang et al.

In heterogeneous multi-agent reinforcement learning (MARL), achieving monotonic improvement plays a pivotal role in enhancing performance. The HAPPO algorithm proposes a feasible solution by introducing a sequential update scheme, which requires independent learning with No Parameter-sharing (NoPS). However, heterogeneous MARL generally requires Partial Parameter-sharing (ParPS) based on agent grouping to achieve high cooperative performance. Our experiments prove that directly combining ParPS with the sequential update scheme leads to the policy updating baseline drift problem, thereby failing to achieve improvement. To solve the conflict between monotonic improvement and ParPS, we propose the Optimal Marginal Deterministic Policy Gradient (OMDPG) algorithm. First, we replace the sequentially computed $Q_ψ^s(s,a_{1:i})$ with the Optimal Marginal Q (OMQ) function $φ_ψ^*(s,a_{1:i})$ derived from Q-functions. This maintains MAAD's monotonic improvement while eliminating the conflict through optimal joint action sequences instead of sequential policy ratio calculations. Second, we introduce the Generalized Q Critic (GQC) as the critic function, employing pessimistic uncertainty-constrained loss to optimize different Q-value estimations. This provides the required Q-values for OMQ computation and stable baselines for actor updates. Finally, we implement a Centralized Critic Grouped Actor (CCGA) architecture that simultaneously achieves ParPS in local policy networks and accurate global Q-function computation. Experimental results in SMAC and MAMuJoCo environments demonstrate that OMDPG outperforms various state-of-the-art MARL baselines.

LGMay 19, 2025
TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability

Tonglong Wei, Yan Lin, Zeyu Zhou et al.

Vehicle GPS trajectories provide valuable movement information that supports various downstream tasks and applications. A desirable trajectory learning model should be able to transfer across regions and tasks without retraining, avoiding the need to maintain multiple specialized models and subpar performance with limited training data. However, each region has its unique spatial features and contexts, which are reflected in vehicle movement patterns and difficult to generalize. Additionally, transferring across different tasks faces technical challenges due to the varying input-output structures required for each task. Existing efforts towards transferability primarily involve learning embedding vectors for trajectories, which perform poorly in region transfer and require retraining of prediction modules for task transfer. To address these challenges, we propose TransferTraj, a vehicle GPS trajectory learning model that excels in both region and task transferability. For region transferability, we introduce RTTE as the main learnable module within TransferTraj. It integrates spatial, temporal, POI, and road network modalities of trajectories to effectively manage variations in spatial context distribution across regions. It also introduces a TRIE module for incorporating relative information of spatial features and a spatial context MoE module for handling movement patterns in diverse contexts. For task transferability, we propose a task-transferable input-output scheme that unifies the input-output structure of different tasks into the masking and recovery of modalities and trajectory points. This approach allows TransferTraj to be pre-trained once and transferred to different tasks without retraining. Extensive experiments on three real-world vehicle trajectory datasets under task transfer, zero-shot, and few-shot region transfer, validating TransferTraj's effectiveness.

LGMay 2, 2025
Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability

Zhaoyang Ma, Zhihao Wu, Wang Lu et al.

The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, we propose a novel method Harmonized Ensemble for Adversarial Transferability (HEAT), which introduces domain generalization into adversarial example generation for the first time. HEAT consists of two key modules: Consensus Gradient Direction Synthesizer, which uses Singular Value Decomposition to synthesize shared gradient directions; and Dual-Harmony Weight Orchestrator which dynamically balances intra-domain coherence, stabilizing gradients within individual models, and inter-domain diversity, enhancing transferability across models. Experimental results demonstrate that HEAT significantly outperforms existing methods across various datasets and settings, offering a new perspective and direction for adversarial attack research.

LGDec 6, 2024
A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models

Shengnan Guo, Tonglong Wei, Yiheng Huang et al.

Traffic data imputation is a critical preprocessing step in intelligent transportation systems, underpinning the reliability of downstream transportation services. Despite substantial progress in imputation models, model selection and development for practical applications remains challenging due to three key gaps: 1) the absence of a model taxonomy for traffic data imputation to trace the technological development and highlight their distinct features. 2) the lack of unified benchmarking pipeline for fair and reproducible model evaluation across standardized traffic datasets. 3) insufficient in-depth analysis that jointly compare models across multiple dimensions, including effectiveness, computational efficiency and robustness. To this end, this paper proposes practice-oriented taxonomies for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios and analyzing the characteristics of existing models. We further introduce a unified benchmarking pipeline to comprehensively evaluate 11 representative models across various missing patterns and rates, assessing overall performance, performance under challenging scenarios, computational efficiency, and providing visualizations. This work aims to provide a holistic perspective on traffic data imputation and to serve as a practical guideline for model selection and application in intelligent transportation systems.

LGOct 18, 2024
PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models

Tonglong Wei, Yan Lin, Youfang Lin et al.

Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.

LGJun 19, 2024
CombAlign: Enhancing Model Expressiveness in Unsupervised Graph Alignment

Songyang Chen, Yu Liu, Lei Zou et al.

Unsupervised graph alignment finds the node correspondence between a pair of attributed graphs by only exploiting graph structure and node features. One category of recent studies first computes the node representation and then matches nodes with the largest embedding-based similarity, while the other category reduces the problem to optimal transport (OT) via Gromov-Wasserstein learning. However, it remains largely unexplored in the model expressiveness, as well as how theoretical expressivity impacts prediction accuracy. We investigate the model expressiveness from two aspects. First, we characterize the model's discriminative power in distinguishing matched and unmatched node pairs across two graphs. Second, we study the model's capability of guaranteeing node matching properties such as one-to-one matching and mutual alignment. Motivated by our theoretical analysis, we put forward a hybrid approach named CombAlign with stronger expressive power. Specifically, we enable cross-dimensional feature interaction for OT-based learning and propose an embedding-based method inspired by the Weisfeiler-Lehman test. We also apply non-uniform marginals obtained from the embedding-based modules to OT as priors for more expressiveness. Based on that, we propose a traditional algorithm-based refinement, which combines our OT and embedding-based predictions using the ensemble learning strategy and reduces the problem to maximum weight matching. With carefully designed edge weights, we ensure those matching properties and further enhance prediction accuracy. By extensive experiments, we demonstrate a significant improvement of 14.5% in alignment accuracy compared to state-of-the-art approaches and confirm the soundness of our theoretical analysis.

LGFeb 11, 2024
UVTM: Universal Vehicle Trajectory Modeling with ST Feature Domain Generation

Yan Lin, Jilin Hu, Shengnan Guo et al.

Vehicle movement is frequently captured in the form of GPS trajectories, i.e., sequences of timestamped GPS locations. Such data is widely used for various tasks such as travel-time estimation, trajectory recovery, and trajectory prediction. A universal vehicle trajectory model could be applied to different tasks, removing the need to maintain multiple specialized models, thereby reducing computational and storage costs. However, creating such a model is challenging when the integrity of trajectory features is compromised, i.e., in scenarios where only partial features are available or the trajectories are sparse. To address these challenges, we propose the Universal Vehicle Trajectory Model (UVTM), which can effectively adapt to different tasks without excessive retraining. UVTM incorporates two specialized designs. First, it divides trajectory features into three distinct domains. Each domain can be masked and generated independently to accommodate tasks with only partially available features. Second, UVTM is pre-trained by reconstructing dense, feature-complete trajectories from sparse, feature-incomplete counterparts, enabling strong performance even when the integrity of trajectory features is compromised. Experiments involving four representative trajectory-related tasks on three real-world vehicle trajectory datasets provide insight into the performance of UVTM and offer evidence that it is capable of meeting its objectives.

AISep 3, 2023
A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Haomin Wen, Youfang Lin, Lixia Wu et al.

Instant delivery services, such as food delivery and package delivery, have achieved explosive growth in recent years by providing customers with daily-life convenience. An emerging research area within these services is service Route\&Time Prediction (RTP), which aims to estimate the future service route as well as the arrival time of a given worker. As one of the most crucial tasks in those service platforms, RTP stands central to enhancing user satisfaction and trimming operational expenditures on these platforms. Despite a plethora of algorithms developed to date, there is no systematic, comprehensive survey to guide researchers in this domain. To fill this gap, our work presents the first comprehensive survey that methodically categorizes recent advances in service route and time prediction. We start by defining the RTP challenge and then delve into the metrics that are often employed. Following that, we scrutinize the existing RTP methodologies, presenting a novel taxonomy of them. We categorize these methods based on three criteria: (i) type of task, subdivided into only-route prediction, only-time prediction, and joint route\&time prediction; (ii) model architecture, which encompasses sequence-based and graph-based models; and (iii) learning paradigm, including Supervised Learning (SL) and Deep Reinforcement Learning (DRL). Conclusively, we highlight the limitations of current research and suggest prospective avenues. We believe that the taxonomy, progress, and prospects introduced in this paper can significantly promote the development of this field.

LGJan 18, 2022
Conservative Distributional Reinforcement Learning with Safety Constraints

Hengrui Zhang, Youfang Lin, Sheng Han et al.

Safety exploration can be regarded as a constrained Markov decision problem where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained optimization problem into the corresponding unconstrained dual problem by introducing the Lagrangian relaxation technique. However, the cost function of the above algorithms provides inaccurate estimations and causes the instability of the Lagrange multiplier learning. In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). At first, to accurately judge whether the current situation satisfies the constraints, CDMPO adapts distributional reinforcement learning method to estimate the Q-function and C-function. Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. In addition, we utilize Weighted Average Proportional Integral Derivative (WAPID) to update the Lagrange multiplier stably. Empirical results show that the proposed method has fewer violations of constraints in the early exploration process. The final test results also illustrate that our method has better risk control.

CVNov 29, 2021
Agent-Centric Relation Graph for Object Visual Navigation

Xiaobo Hu, Youfang Lin, Shuo Wang et al.

Object visual navigation aims to steer an agent toward a target object based on visual observations. It is highly desirable to reasonably perceive the environment and accurately control the agent. In the navigation task, we introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment. ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects . On the one hand, we design the Object Horizontal Relationship Graph (OHRG) that stores the relative horizontal location among objects. On the other hand, we propose the Agent-Target Distance Relationship Graph (ATDRG) that enables the agent to perceive the distance between the target and objects. For ATDRG, we utilize image depth to obtain the target distance and imply the vertical location to capture the distance relationship among objects in the vertical direction. With the above graphs, the agent can perceive the environment and output navigation actions. Experimental results in the artificial environment AI2-THOR demonstrate that ACRG significantly outperforms other state-of-the-art methods in unseen testing environments.

SPOct 15, 2021
SleepPriorCL: Contrastive Representation Learning with Prior Knowledge-based Positive Mining and Adaptive Temperature for Sleep Staging

Hongjun Zhang, Jing Wang, Qinfeng Xiao et al.

The objective of this paper is to learn semantic representations for sleep stage classification from raw physiological time series. Although supervised methods have gained remarkable performance, they are limited in clinical situations due to the requirement of fully labeled data. Self-supervised learning (SSL) based on contrasting semantically similar (positive) and dissimilar (negative) pairs of samples have achieved promising success. However, existing SSL methods suffer the problem that many semantically similar positives are still uncovered and even treated as negatives. In this paper, we propose a novel SSL approach named SleepPriorCL to alleviate the above problem. Advances of our approach over existing SSL methods are two-fold: 1) by incorporating prior domain knowledge into the training regime of SSL, more semantically similar positives are discovered without accessing ground-truth labels; 2) via investigating the influence of the temperature in contrastive loss, an adaptive temperature mechanism for each sample according to prior domain knowledge is further proposed, leading to better performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance and consistently outperforms baselines.

SPSep 4, 2021
Multi-View Spatial-Temporal Graph Convolutional Networks with Domain Generalization for Sleep Stage Classification

Ziyu Jia, Youfang Lin, Jing Wang et al.

Sleep stage classification is essential for sleep assessment and disease diagnosis. Although previous attempts to classify sleep stages have achieved high classification performance, several challenges remain open: 1) How to effectively utilize time-varying spatial and temporal features from multi-channel brain signals remains challenging. Prior works have not been able to fully utilize the spatial topological information among brain regions. 2) Due to the many differences found in individual biological signals, how to overcome the differences of subjects and improve the generalization of deep neural networks is important. 3) Most deep learning methods ignore the interpretability of the model to the brain. To address the above challenges, we propose a multi-view spatial-temporal graph convolutional networks (MSTGCN) with domain generalization for sleep stage classification. Specifically, we construct two brain view graphs for MSTGCN based on the functional connectivity and physical distance proximity of the brain regions. The MSTGCN consists of graph convolutions for extracting spatial features and temporal convolutions for capturing the transition rules among sleep stages. In addition, attention mechanism is employed for capturing the most relevant spatial-temporal information for sleep stage classification. Finally, domain generalization and MSTGCN are integrated into a unified framework to extract subject-invariant sleep features. Experiments on two public datasets demonstrate that the proposed model outperforms the state-of-the-art baselines.