Yuankai Wu

LG
h-index13
36papers
1,050citations
Novelty51%
AI Score59

36 Papers

LGNov 24, 2022Code
Explainable and Safe Reinforcement Learning for Autonomous Air Mobility

Lei Wang, Hongyu Yang, Yi Lin et al.

Increasing traffic demands, higher levels of automation, and communication enhancements provide novel design opportunities for future air traffic controllers (ATCs). This article presents a novel deep reinforcement learning (DRL) controller to aid conflict resolution for autonomous free flight. Although DRL has achieved important advancements in this field, the existing works pay little attention to the explainability and safety issues related to DRL controllers, particularly the safety under adversarial attacks. To address those two issues, we design a fully explainable DRL framework wherein we: 1) decompose the coupled Q value learning model into a safety-awareness and efficiency (reach the target) one; and 2) use information from surrounding intruders as inputs, eliminating the needs of central controllers. In our simulated experiments, we show that by decoupling the safety-awareness and efficiency, we can exceed performance on free flight control tasks while dramatically improving explainability on practical. In addition, the safety Q learning module provides rich information about the safety situation of environments. To study the safety under adversarial attacks, we additionally propose an adversarial attack strategy that can impose both safety-oriented and efficiency-oriented attacks. The adversarial aims to minimize safety/efficiency by only attacking the agent at a few time steps. In the experiments, our attack strategy increases as many collisions as the uniform attack (i.e., attacking at every time step) by only attacking the agent four times less often, which provide insights into the capabilities and restrictions of the DRL in future ATC designs. The source code is publicly available at https://github.com/WLeiiiii/Gym-ATC-Attack-Project.

44.5LGMay 31
AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

Zhongyue Zhang, Guangyin Jin, Yuxuan Liang et al.

Modeling spatial dependencies is central to spatiotemporal data analysis using Graph Neural Networks (GNNs). Traditional methods rely on distance-based kernels with predefined parameters, which restricts model capacity. Although generic adaptive mechanisms (e.g., Graph Attention Networks) offer flexibility, they often fail to capture the underlying geometric structure, performing worse than distance-based models in data-sparse scenarios. Addressing this, we revisit the kernel parameterization problem and theoretically prove that misspecified kernel parameters introduce unavoidable approximation errors in GNNs. To overcome this, we propose AdaKernel, a simple yet effective approach that learns adaptive kernel parameters within the neural network. Unlike methods that learn graph structures from scratch, AdaKernel adopts a structure-preserving strategy that optimizes the scale of physical interactions rather than discarding them. Extensive experiments on Kriging, Imputation, and Forecasting demonstrate that AdaKernel consistently improves various GNN architectures and outperforms model-agnostic adaptive baselines, validating that accurately learned kernel parameters are superior to both fixed priors and fully latent graph structures.

69.5CEMay 20
MulFSA: Multi-level Financial Sentiment Analysis Framework for Bond Market

Yiwei Liu, Junbo Wang, Lei Long et al.

Existing financial sentiment analysis methods often fail to capture the multi-faceted nature of risk in bond markets due to their single-level approach and neglect of temporal dynamics. We propose Multi-level Financial Sentiment Analysis (MulFSA) based on pre-trained language models (PLMs) and large language models (LLMs), a novel framework that systematically integrates firm-specific micro-level sentiment, industry-specific meso-level sentiment, and duration-aware smoothing to model the latency and persistence of textual impact. Applying MulFSA to the comprehensive Chinese bond market corpus constructed by us (2013-2023, 1.35M texts), we extracted a daily composite sentiment index. Empirical results show statistically measurable improvements in credit spread forecasting when incorporating sentiment (10.25% MAE and 11.94% MAPE reduction), with sentiment shifts closely correlating with major social risk events and firm-specific crises. Project Page: https://mulfsa.github.io/.

LGJul 14, 2022
Spatiotemporal Propagation Learning for Network-Wide Flight Delay Prediction

Yuankai Wu, Hongyu Yang, Yi Lin et al.

Demystifying the delay propagation mechanisms among multiple airports is fundamental to precise and interpretable delay prediction, which is crucial during decision-making for all aviation industry stakeholders. The principal challenge lies in effectively leveraging the spatiotemporal dependencies and exogenous factors related to the delay propagation. However, previous works only consider limited spatiotemporal patterns with few factors. To promote more comprehensive propagation modeling for delay prediction, we propose SpatioTemporal Propagation Network (STPN), a space-time separable graph convolutional network, which is novel in spatiotemporal dependency capturing. From the aspect of spatial relation modeling, we propose a multi-graph convolution model considering both geographic proximity and airline schedule. From the aspect of temporal dependency capturing, we propose a multi-head self-attentional mechanism that can be learned end-to-end and explicitly reason multiple kinds of temporal dependency of delay time series. We show that the joint spatial and temporal learning models yield a sum of the Kronecker product, which factors the spatiotemporal dependence into the sum of several spatial and temporal adjacency matrices. By this means, STPN allows cross-talk of spatial and temporal factors for modeling delay propagation. Furthermore, a squeeze and excitation module is added to each layer of STPN to boost meaningful spatiotemporal features. To this end, we apply STPN to multi-step ahead arrival and departure delay prediction in large-scale airport networks. To validate the effectiveness of our model, we experiment with two real-world delay datasets, including U.S and China flight delays; and we show that STPN outperforms state-of-the-art methods. In addition, counterfactuals produced by STPN show that it learns explainable delay propagation patterns.

LGDec 12, 2022
Spatial-temporal traffic modeling with a fusion graph reconstructed by tensor decomposition

Qin Li, Xuan Yang, Yong Wang et al.

Accurate spatial-temporal traffic flow forecasting is essential for helping traffic managers to take control measures and drivers to choose the optimal travel routes. Recently, graph convolutional networks (GCNs) have been widely used in traffic flow prediction owing to their powerful ability to capture spatial-temporal dependencies. The design of the spatial-temporal graph adjacency matrix is a key to the success of GCNs, and it is still an open question. This paper proposes reconstructing the binary adjacency matrix via tensor decomposition, and a traffic flow forecasting method is proposed. First, we reformulate the spatial-temporal fusion graph adjacency matrix into a three-way adjacency tensor. Then, we reconstructed the adjacency tensor via Tucker decomposition, wherein more informative and global spatial-temporal dependencies are encoded. Finally, a Spatial-temporal Synchronous Graph Convolutional module for localized spatial-temporal correlations learning and a Dilated Convolution module for global correlations learning are assembled to aggregate and learn the comprehensive spatial-temporal dependencies of the road network. Experimental results on four open-access datasets demonstrate that the proposed model outperforms state-of-the-art approaches in terms of the prediction performance and computational cost.

LGAug 2, 2023
Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach

Chunwei Yang, Xiaoxu Chen, Lijun Sun et al.

Time series analysis is a fundamental task in various application domains, and deep learning approaches have demonstrated remarkable performance in this area. However, many real-world time series data exhibit significant periodic or quasi-periodic dynamics that are often not adequately captured by existing deep learning-based solutions. This results in an incomplete representation of the underlying dynamic behaviors of interest. To address this gap, we propose an unsupervised method called Floss that automatically regularizes learned representations in the frequency domain. The Floss method first automatically detects major periodicities from the time series. It then employs periodic shift and spectral density similarity measures to learn meaningful representations with periodic consistency. In addition, Floss can be easily incorporated into both supervised, semi-supervised, and unsupervised learning frameworks. We conduct extensive experiments on common time series classification, forecasting, and anomaly detection tasks to demonstrate the effectiveness of Floss. We incorporate Floss into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.

81.2LGMay 16Code
PULSE: Generative Phase Evolution for Non-Stationary Time Series Forecasting

Yangyou Liu, Zezhi Shao, Xinyu Chen et al.

Time series forecasting under non-stationarity faces a fundamental tension between capturing stable representations and adapting to distribution shifts. Existing methods implicitly rely on static historical assumptions, leading to a critical failure mode we term Phase Amnesia, where models become blind to the evolving global context. To resolve this, we formalize non-stationary dynamics through three physical hypotheses: wold decomposition, dynamical phase evolution, and heteroscedastic manifold generation. These principles inspire PULSE, a physics-informed, plug-and-play framework adopting a Disentangle--Evolve--Simulate design philosophy. Specifically, PULSE utilizes phase-anchored disentanglement to resolve optimization interference caused by dominant trends, employs a Phase Router to actively generate future trajectories, and introduces Statistic-Aware Mixup (SAM) to ensure robustness against out-of-distribution volatility. Empirically, PULSE enables a simple MLP backbone to achieve state-of-the-art or highly competitive performance across 12 real-world benchmarks. This validates that a correct physics-informed inductive bias is far more critical than raw architectural complexity for non-stationary forecasting. The code is available at: https://github.com/Gemost/PULSE.

LGOct 30, 2025Code
Aeolus: A Multi-structural Flight Delay Dataset

Lin Xu, Xinyun Yuan, Yuxuan Liang et al.

We introduce Aeolus, a large-scale Multi-modal Flight Delay Dataset designed to advance research on flight delay prediction and support the development of foundation models for tabular data. Existing datasets in this domain are typically limited to flat tabular structures and fail to capture the spatiotemporal dynamics inherent in delay propagation. Aeolus addresses this limitation by providing three aligned modalities: (i) a tabular dataset with rich operational, meteorological, and airportlevel features for over 50 million flights; (ii) a flight chain module that models delay propagation along sequential flight legs, capturing upstream and downstream dependencies; and (iii) a flight network graph that encodes shared aircraft, crew, and airport resource connections, enabling cross-flight relational reasoning. The dataset is carefully constructed with temporal splits, comprehensive features, and strict leakage prevention to support realistic and reproducible machine learning evaluation. Aeolus supports a broad range of tasks, including regression, classification, temporal structure modeling, and graph learning, serving as a unified benchmark across tabular, sequential, and graph modalities. We release baseline experiments and preprocessing tools to facilitate adoption. Aeolus fills a key gap for both domain-specific modeling and general-purpose structured data research.Our source code and data can be accessed at https://github.com/Flnny/Delay-data

CVFeb 24Code
PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

Xinyong Cai, Changbin Sun, Yong Wang et al.

Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biological center-surround organization and frequency-selective signal processing, we propose PFGNet, a fully convolutional framework that dynamically modulates receptive fields through pixel-wise frequency-guided gating. The core Peripheral Frequency Gating (PFG) block extracts localized spectral cues and adaptively fuses multi-scale large-kernel peripheral responses with learnable center suppression, effectively forming spatially adaptive band-pass filters. To maintain efficiency, all large kernels are decomposed into separable 1D convolutions ($1 \times k$ followed by $k \times 1$), reducing per-channel computational cost from $O(k^2)$ to $O(2k)$. PFGNet enables structure-aware spatiotemporal modeling without recurrence or attention. Experiments on Moving MNIST, TaxiBJ, Human3.6M, and KTH show that PFGNet delivers SOTA or near-SOTA forecasting performance with substantially fewer parameters and FLOPs. Our code is available at https://github.com/fhjdqaq/PFGNet.

LGNov 19, 2023
TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss

Site Mo, Haoxin Wang, Bixiong Li et al.

Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.

48.7CVMar 24Code
WaveSFNet: A Wavelet-Based Codec and Spatial--Frequency Dual-Domain Gating Network for Spatiotemporal Prediction

Xinyong Cai, Runming Xie, Hu Chen et al.

Spatiotemporal predictive learning aims to forecast future frames from historical observations in an unsupervised manner, and is critical to a wide range of applications. The key challenge is to model long-range dynamics while preserving high-frequency details for sharp multi-step predictions. Existing efficient recurrent-free frameworks typically rely on strided convolutions or pooling for sampling, which tends to discard textures and boundaries, while purely spatial operators often struggle to balance local interactions with global propagation. To address these issues, we propose WaveSFNet, an efficient framework that unifies a wavelet-based codec with a spatial--frequency dual-domain gated spatiotemporal translator. The wavelet-based codec preserves high-frequency subband cues during downsampling and reconstruction. Meanwhile, the translator first injects adjacent-frame differences to explicitly enhance dynamic information, and then performs dual-domain gated fusion between large-kernel spatial local modeling and frequency-domain global modulation, together with gated channel interaction for cross-channel feature exchange. Extensive experiments demonstrate that WaveSFNet achieves competitive prediction accuracy on Moving MNIST, TaxiBJ, and WeatherBench, while maintaining low computational complexity. Our code is available at https://github.com/fhjdqaq/WaveSFNet.

60.2ROMay 19
Implicit Action Chunking for Smooth Continuous Control

Bosun Liang, Shuo Pei, Zirui Chen et al.

Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but scales the policy output dimension proportionally with the horizon length, leading to optimization difficulties and incompatibility with standard step-wise interaction. To overcome these challenges, this paper proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control. Unlike explicit methods, DWS enforces temporal coherence without expanding the action space. It uses a dual-window design: an execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets over the horizon to correct critic bias caused by open-loop execution. DWS also includes a lightweight actor-side temporal regularizer based on first-order action differences to promote global continuity. This design effectively bridges the gap between temporal abstraction and reactive step-wise control. Experiments on benchmarks including the DeepMind Control Suite and industrial energy management tasks show that DWS outperforms state-of-the-art (SOTA) baselines. In complex vision-based autonomous driving tasks, DWS achieves smoother control, safer behavior with reduced jitter, and attains a 100% success rate.

LGMay 24, 2024Code
NuwaTS: a Foundation Model Mending Every Incomplete Time Series

Jinguo Cheng, Chunwei Yang, Wanlin Cai et al.

Time series imputation is critical for many real-world applications and has been widely studied. However, existing models often require specialized designs tailored to specific missing patterns, variables, or domains which limits their generalizability. In addition, current evaluation frameworks primarily focus on domain-specific tasks and often rely on time-wise train/validation/test data splits, which fail to rigorously assess a model's ability to generalize across unseen variables or domains. In this paper, we present \textbf{NuwaTS}, a novel framework that repurposes Pre-trained Language Models (PLMs) for general time series imputation. Once trained, NuwaTS can be applied to impute missing data across any domain. We introduce specialized embeddings for each sub-series patch, capturing information about the patch, its missing data patterns, and its statistical characteristics. By combining contrastive learning with the imputation task, we train PLMs to create a versatile, one-for-all imputation model. Additionally, we employ a plug-and-play fine-tuning approach, enabling efficient adaptation to domain-specific tasks with minimal adjustments. To evaluate cross-variable and cross-domain generalization, we propose a new benchmarking protocol that partitions the datasets along the variable dimension. Experimental results on over seventeen million time series samples from diverse domains demonstrate that NuwaTS outperforms state-of-the-art domain-specific models across various datasets under the proposed benchmarking protocol. Furthermore, we show that NuwaTS generalizes to other time series tasks, such as forecasting. Our codes are available at https://github.com/Chengyui/NuwaTS.

LGJan 22, 2025Code
AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks

Qiongyan Wang, Yutong Xia, Siru ZHong et al.

Monitoring real-time air quality is essential for safeguarding public health and fostering social progress. However, the widespread deployment of air quality monitoring stations is constrained by their significant costs. To address this limitation, we introduce \emph{AirRadar}, a deep neural network designed to accurately infer real-time air quality in locations lacking monitoring stations by utilizing data from existing ones. By leveraging learnable mask tokens, AirRadar reconstructs air quality features in unmonitored regions. Specifically, it operates in two stages: first capturing spatial correlations and then adjusting for distribution shifts. We validate AirRadar's efficacy using a year-long dataset from 1,085 monitoring stations across China, demonstrating its superiority over multiple baselines, even with varying degrees of unobserved data. The source code can be accessed at https://github.com/CityMind-Lab/AirRadar.

LGDec 31, 2023
MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting

Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.

Multivariate time series forecasting poses an ongoing challenge across various disciplines. Time series data often exhibit diverse intra-series and inter-series correlations, contributing to intricate and interwoven dependencies that have been the focus of numerous studies. Nevertheless, a significant research gap remains in comprehending the varying inter-series correlations across different time scales among multiple time series, an area that has received limited attention in the literature. To bridge this gap, this paper introduces MSGNet, an advanced deep learning model designed to capture the varying inter-series correlations across multiple time scales using frequency domain analysis and adaptive graph convolution. By leveraging frequency domain analysis, MSGNet effectively extracts salient periodic patterns and decomposes the time series into distinct time scales. The model incorporates a self-attention mechanism to capture intra-series dependencies, while introducing an adaptive mixhop graph convolution layer to autonomously learn diverse inter-series correlations within each time scale. Extensive experiments are conducted on several real-world datasets to showcase the effectiveness of MSGNet. Furthermore, MSGNet possesses the ability to automatically learn explainable multi-scale inter-series correlations, exhibiting strong generalization capabilities even when applied to out-of-distribution samples.

AIAug 6, 2025Code
GeoSR: Cognitive-Agentic Framework for Probing Geospatial Knowledge Boundaries via Iterative Self-Refinement

Jinfan Tang, Kunming Wu, Ruifeng Gongxie et al.

Recent studies have extended the application of large language models (LLMs) to geographic problems, revealing surprising geospatial competence even without explicit spatial supervision. However, LLMs still face challenges in spatial consistency, multi-hop reasoning, and geographic bias. To address these issues, we propose GeoSR, a self-refining agentic reasoning framework that embeds core geographic principles -- most notably Tobler's First Law of Geography -- into an iterative prediction loop. In GeoSR, the reasoning process is decomposed into three collaborating agents: (1) a variable-selection agent that selects relevant covariates from the same location; (2) a point-selection agent that chooses reference predictions at nearby locations generated by the LLM in previous rounds; and (3) a refine agent that coordinates the iterative refinement process by evaluating prediction quality and triggering further rounds when necessary. This agentic loop progressively improves prediction quality by leveraging both spatial dependencies and inter-variable relationships. We validate GeoSR on tasks ranging from physical-world property estimation to socioeconomic prediction. Experimental results show consistent improvements over standard prompting strategies, demonstrating that incorporating geostatistical priors and spatially structured reasoning into LLMs leads to more accurate and equitable geospatial predictions. The code of GeoSR is available at https://github.com/JinfanTang/GeoSR.

LGFeb 6, 2024
Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-Experts

Kun Wang, Hao Wu, Guibin Zhang et al.

In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames. Traditional numerical simulation systems depend largely on the initial settings and correctness of the constructed partial differential equations (PDEs). Despite recent efforts yielding significant success in discovering data-driven PDEs with neural networks, the limitations posed by singular scenarios and the absence of local insights prevent them from performing effectively in a broader real-world context. To this end, this paper propose the universal expert module -- that is, optical flow estimation component, to capture the evolution laws of general physical processes in a data-driven fashion. To enhance local insight, we painstakingly design a finer-grained physical pipeline, since local characteristics may be influenced by various internal contextual information, which may contradict the macroscopic properties of the whole system. Further, we harness currently popular neural discrete learning to unveil the underlying important features in its latent space, this process better injects interpretability, which can help us obtain a powerful prior over these discrete random variables. We conduct extensive experiments and ablations to demonstrate that the proposed framework achieves large performance margins, compared with the existing SOTA baselines.

CVFeb 27, 2024
ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Marsil Zakour, Partha Pratim Nath, Ludwig Lohmer et al.

Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, previous actions, and future intents (for example, grasping and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject interacting with one object only. This restricts the generalization of learning-based HOI methods trained on those datasets. We introduce ADL4D, a dataset of up to two subjects interacting with different sets of objects performing Activities of Daily Living (ADL) like breakfast or lunch preparation activities. The transition between multiple objects to complete a certain task over time introduces a unique context lacking in existing datasets. Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose annotation capable of tracking hand poses over time. We integrate and test it against publicly available datasets. Finally, we evaluate our dataset on the tasks of Hand Mesh Recovery (HMR) and Hand Action Segmentation (HAS).

LGDec 4, 2023
Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series Forecasting Approach

Jinguo Cheng, Ke Li, Yuxuan Liang et al.

Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services. Conventionally, urban mobility data has been structured as spatiotemporal videos, treating longitude and latitude grids as fundamental pixels. Consequently, video prediction methods, relying on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have been instrumental in this domain. In our research, we introduce a fresh perspective on urban mobility prediction. Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex multivariate time series. This perspective involves treating the time-varying values of each grid in each channel as individual time series, necessitating a thorough examination of temporal dynamics, cross-variable correlations, and frequency-domain insights for precise and reliable predictions. To address this challenge, we present the Super-Multivariate Urban Mobility Transformer (SUMformer), which utilizes a specially designed attention mechanism to calculate temporal and cross-variable correlations and reduce computational costs stemming from a large number of time series. SUMformer also employs low-frequency filters to extract essential information for long-term predictions. Furthermore, SUMformer is structured with a temporal patch merge mechanism, forming a hierarchical framework that enables the capture of multi-scale correlations. Consequently, it excels in urban mobility pattern modeling and long-term prediction, outperforming current state-of-the-art methods across three real-world datasets.

LGApr 20, 2025
Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning

Jia Liu, Cheng Jinguo, Xia Fang et al.

Time series foundation models excel at diverse time series forecasting tasks, but their capacity for continuous improvement through incremental learning remains unexplored. We present the first comprehensive study investigating these models' temporal plasticity - their ability to progressively enhance performance through continual learning while maintaining existing capabilities. Through experiments on real-world datasets exhibiting distribution shifts, we evaluate both conventional deep learning models and foundation models using a novel continual learning framework. Our findings reveal that while traditional models struggle with performance deterioration during incremental fine-tuning, foundation models like Time-MoE and Chronos demonstrate sustained improvement in predictive accuracy. This suggests that optimizing foundation model fine-tuning strategies may be more valuable than developing domain-specific small models. Our research introduces new evaluation methodologies and insights for developing foundation time series models with robust continuous learning capabilities.

69.3LGApr 6
Discrete Prototypical Memories for Federated Time Series Foundation Models

Liwei Deng, Qingxiang Liu, Xinhe Niu et al.

Leveraging Large Language Models (LLMs) as federated learning (FL)-based time series foundation models offers a promising way to transfer the generalization capabilities of LLMs to time series data while preserving access to private data. However, the semantic misalignment between time-series data and the text-centric latent space of existing LLMs often leads to degraded performance. Meanwhile, the parameter-sharing mechanism in existing FL methods model heterogeneous cross-domain time-series data into a unified continuous latent space, which contradicts the fact that time-series semantics frequently manifest as discrete and recurring regimes. To address these limitations, we propose \textsc{FeDPM}, a federated framework for time-series foundation models based on discrete prototypical memories. Specifically, we learn local prototypical memory priors for intra-domain time-series data. We then align cross-domain memories to promote a unified discrete latent space and introduce a domain-specific memory update mechanism to balance shared and personalized prototypical knowledge. Extensive experiments demonstrate the efficiency and effectiveness of \textsc{FeDPM}. The code is publicly available at https://anonymous.4open.science/r/FedUnit-64D1.

CYFeb 28, 2024
Machina Economicus: A New Paradigm for Prosumers in the Energy Internet of Smart Cities

Luyang Hou, Jun Yan, Yuankai Wu et al.

Energy Internet (EI) is emerging as new share economy platform for flexible local energy supplies in smart cities. Empowered by the Internet-of-Things (IoT) and Artificial Intelligence (AI), EI aims to unlock peer-to-peer energy trading and sharing among prosumers, who can adeptly switch roles between providers and consumers in localized energy markets with rooftop photovoltaic panels, vehicle-to-everything technologies, packetized energy management, etc. The integration of prosumers in EI, however, will encounter many challenges in modelling, analyzing, and designing an efficient, economic, and social-optimal platform for energy sharing, calling for advanced AI/IoT-based solutions to resource optimization, information exchange, and interaction protocols in the context of the share economy. In this study, we aim to introduce a recently emerged paradigm, Machina Economicus, to investigate the economic rationality in modelling, analysis, and optimization of AI/IoT-based EI prosumer behaviors. The new paradigm, built upon the theory of machine learning and mechanism design, will offer new angles to investigate the selfishness of AI through a game-theoretic perspective, revealing potential competition and collaborations resulting from the self-adaptive learning and decision-making capacity. This study will focus on how the introduction of AI will reshape prosumer behaviors on the EI, and how this paradigm will reveal new research questions and directions when AI meets the share economy. With an extensive case analysis in the literature, we will also shed light on potential solutions for advancements of AI in future smart cities.

LGAug 4, 2025
Revitalizing Canonical Pre-Alignment for Irregular Multivariate Time Series Forecasting

Ziyu Zhou, Yiming Huang, Yanyun Wang et al.

Irregular multivariate time series (IMTS), characterized by uneven sampling and inter-variate asynchrony, fuel many forecasting applications yet remain challenging to model efficiently. Canonical Pre-Alignment (CPA) has been widely adopted in IMTS modeling by padding zeros at every global timestamp, thereby alleviating inter-variate asynchrony and unifying the series length, but its dense zero-padding inflates the pre-aligned series length, especially when numerous variates are present, causing prohibitive compute overhead. Recent graph-based models with patching strategies sidestep CPA, but their local message passing struggles to capture global inter-variate correlations. Therefore, we posit that CPA should be retained, with the pre-aligned series properly handled by the model, enabling it to outperform state-of-the-art graph-based baselines that sidestep CPA. Technically, we propose KAFNet, a compact architecture grounded in CPA for IMTS forecasting that couples (1) Pre-Convolution module for sequence smoothing and sparsity mitigation, (2) Temporal Kernel Aggregation module for learnable compression and modeling of intra-series irregularity, and (3) Frequency Linear Attention blocks for the low-cost inter-series correlations modeling in the frequency domain. Experiments on multiple IMTS datasets show that KAFNet achieves state-of-the-art forecasting performance, with a 7.2$\times$ parameter reduction and a 8.4$\times$ training-inference acceleration.

CVJul 1, 2025
Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation

Hao Xing, Kai Zhe Boey, Yuankai Wu et al.

Accurate temporal segmentation of human actions is critical for intelligent robots in collaborative settings, where a precise understanding of sub-activity labels and their temporal structure is essential. However, the inherent noise in both human pose estimation and object detection often leads to over-segmentation errors, disrupting the coherence of action sequences. To address this, we propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 30 fps) motion data (skeleton and object detections) to mitigate fragmentation. Our framework introduces three key contributions. First, a sinusoidal encoding strategy that maps 3D skeleton coordinates into a continuous sin-cos space to enhance spatial representation robustness. Second, a temporal graph fusion module that aligns multi-modal inputs with differing resolutions via hierarchical feature aggregation, Third, inspired by the smooth transitions inherent to human actions, we design SmoothLabelMix, a data augmentation technique that mixes input sequences and labels to generate synthetic training examples with gradual action transitions, enhancing temporal consistency in predictions and reducing over-segmentation artifacts. Extensive experiments on the Bimanual Actions Dataset, a public benchmark for human-object interaction understanding, demonstrate that our approach outperforms state-of-the-art methods, especially in action segmentation accuracy, achieving F1@10: 94.5% and F1@25: 92.8%.

AIJun 20, 2025
Reinforcement Learning for Hybrid Charging Stations Planning and Operation Considering Fixed and Mobile Chargers

Yanchen Zhu, Honghui Zou, Chufan Liu et al.

The success of vehicle electrification relies on efficient and adaptable charging infrastructure. Fixed-location charging stations often suffer from underutilization or congestion due to fluctuating demand, while mobile chargers offer flexibility by relocating as needed. This paper studies the optimal planning and operation of hybrid charging infrastructures that combine both fixed and mobile chargers within urban road networks. We formulate the Hybrid Charging Station Planning and Operation (HCSPO) problem, jointly optimizing the placement of fixed stations and the scheduling of mobile chargers. A charging demand prediction model based on Model Predictive Control (MPC) supports dynamic decision-making. To solve the HCSPO problem, we propose a deep reinforcement learning approach enhanced with heuristic scheduling. Experiments on real-world urban scenarios show that our method improves infrastructure availability - achieving up to 244.4% increase in coverage - and reduces user inconvenience with up to 79.8% shorter waiting times, compared to existing solutions.

CVFeb 13, 2025
Memory-based Ensemble Learning in CMR Semantic Segmentation

Yiwei Liu, Ziyi Wu, Liang Zhong et al.

Existing models typically segment either the entire 3D frame or 2D slices independently to derive clinical functional metrics from ventricular segmentation in cardiac cine sequences. While performing well overall, they struggle at the end slices. To address this, we leverage spatial continuity to extract global uncertainty from segmentation variance and use it as memory in our ensemble learning method, Streaming, for classifier weighting, balancing overall and end-slice performance. Additionally, we introduce the End Coefficient (EC) to quantify end-slice accuracy. Experiments on ACDC and M&Ms datasets show that our framework achieves near-state-of-the-art Dice Similarity Coefficient (DSC) and outperforms all models on end-slice performance, improving patient-specific segmentation accuracy.

LGSep 24, 2021
Spatial Aggregation and Temporal Convolution Networks for Real-time Kriging

Yuankai Wu, Dingyi Zhuang, Mengying Lei et al.

Spatiotemporal kriging is an important application in spatiotemporal data analysis, aiming to recover/interpolate signals for unsampled/unobserved locations based on observed signals. The principle challenge for spatiotemporal kriging is how to effectively model and leverage the spatiotemporal dependencies within the data. Recently, graph neural networks (GNNs) have shown great promise for spatiotemporal kriging tasks. However, standard GNNs often require a carefully designed adjacency matrix and specific aggregation functions, which are inflexible for general applications/problems. To address this issue, we present SATCN -- Spatial Aggregation and Temporal Convolution Networks -- a universal and flexible framework to perform spatiotemporal kriging for various spatiotemporal datasets without the need for model specification. Specifically, we propose a novel spatial aggregation network (SAN) inspired by Principal Neighborhood Aggregation, which uses multiple aggregation functions to help one node gather diverse information from its neighbors. To exclude information from unsampled nodes, a masking strategy that prevents the unsampled sensors from sending messages to their neighborhood is introduced to SAN. We capture temporal dependencies by the temporal convolutional networks, which allows our model to cope with data of diverse sizes. To make SATCN generalizable to unseen nodes and even unseen graph structures, we employ an inductive strategy to train SATCN. We conduct extensive experiments on three real-world spatiotemporal datasets, including traffic speed and climate recordings. Our results demonstrate the superiority of SATCN over traditional and GNN-based kriging models.

LGSep 6, 2021
Individual Mobility Prediction via Attentive Marked Temporal Point Processes

Yuankai Wu, Zhanhong Cheng, Lijun Sun

Individual mobility prediction is an essential task for transportation demand management and traffic system operation. There exist a large body of works on modeling location sequence and predicting the next location of users; however, little attention is paid to the prediction of the next trip, which is governed by the strong spatiotemporal dependencies between diverse attributes, including trip start time $t$, origin $o$, and destination $d$. To fill this gap, in this paper we propose a novel point process-based model -- Attentive Marked temporal point processes (AMTPP) -- to model human mobility and predict the whole trip $(t,o,d)$ in a joint manner. To encode the influence of history trips, AMTPP employs the self-attention mechanism with a carefully designed positional embedding to capture the daily/weekly periodicity and regularity in individual travel behavior. Given the unique peaked nature of inter-event time in human behavior, we use an asymmetric log-Laplace mixture distribution to precisely model the distribution of trip start time $t$. Furthermore, an origin-destination (OD) matrix learning block is developed to model the relationship between every origin and destination pair. Experimental results on two large metro trip datasets demonstrate the superior performance of AMTPP.

LGMay 21, 2021
Low-Rank Hankel Tensor Completion for Traffic Speed Estimation

Xudong Wang, Yuankai Wu, Dingyi Zhuang et al.

This paper studies the traffic state estimation (TSE) problem using sparse observations from mobile sensors. Most existing TSE methods either rely on well-defined physical traffic flow models or require large amounts of simulation data as input to train machine learning models. Different from previous studies, we propose a purely data-driven and model-free solution in this paper. We consider the TSE as a spatiotemporal matrix completion/interpolation problem, and apply spatiotemporal delay embedding to transform the original incomplete matrix into a fourth-order Hankel structured tensor. By imposing a low-rank assumption on this tensor structure, we can approximate and characterize both global and local spatiotemporal patterns in a data-driven manner. We use the truncated nuclear norm of a balanced spatiotemporal unfolding -- in which each column represents the vectorization of a small patch in the original matrix -- to approximate the tensor rank. An efficient solution algorithm based on the Alternating Direction Method of Multipliers (ADMM) is developed for model learning. The proposed framework only involves two hyperparameters, spatial and temporal window lengths, which are easy to set given the degree of data sparsity. We conduct numerical experiments on real-world high-resolution trajectory data, and our results demonstrate the effectiveness and superiority of the proposed model in some challenging scenarios.

LGMar 14, 2021
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks

Manoj Rohit Vemparala, Alexander Frickenstein, Nael Fasfous et al.

Deploying convolutional neural networks (CNNs) for embedded applications presents many challenges in balancing resource-efficiency and task-related accuracy. These two aspects have been well-researched in the field of CNN compression. In real-world applications, a third important aspect comes into play, namely the robustness of the CNN. In this paper, we thoroughly study the robustness of uncompressed, distilled, pruned and binarized neural networks against white-box and black-box adversarial attacks (FGSM, PGD, C&W, DeepFool, LocalSearch and GenAttack). These new insights facilitate defensive training schemes or reactive filtering methods, where the attack is detected and the input is discarded and/or cleaned. Experimental results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks (BNNs) such as XNOR-Net and ABC-Net, trained on CIFAR-10 and ImageNet datasets. We present evaluation methods to simplify the comparison between CNNs under different attack schemes using loss/accuracy levels, stress-strain graphs, box-plots and class activation mapping (CAM). Our analysis reveals susceptible behavior of uncompressed and pruned CNNs against all kinds of attacks. The distilled models exhibit their strength against all white box attacks with an exception of C&W. Furthermore, binary neural networks exhibit resilient behavior compared to their baselines and other compressed variants.

MLJul 6, 2020
Efficient Connected and Automated Driving System with Multi-agent Graph Reinforcement Learning

Tianyu Shi, Jiawei Wang, Yuankai Wu et al.

Connected and automated vehicles (CAVs) have attracted more and more attention recently. The fast actuation time allows them having the potential to promote the efficiency and safety of the whole transportation system. Due to technical challenges, there will be a proportion of vehicles that can be equipped with automation while other vehicles are without automation. Instead of learning a reliable behavior for ego automated vehicle, we focus on how to improve the outcomes of the total transportation system by allowing each automated vehicle to learn cooperation with each other and regulate human-driven traffic flow. One of state of the art method is using reinforcement learning to learn intelligent decision making policy. However, direct reinforcement learning framework cannot improve the performance of the whole system. In this article, we demonstrate that considering the problem in multi-agent setting with shared policy can help achieve better system performance than non-shared policy in single-agent setting. Furthermore, we find that utilization of attention mechanism on interaction features can capture the interplay between each agent in order to boost cooperation. To the best of our knowledge, while previous automated driving studies mainly focus on enhancing individual's driving performance, this work serves as a starting point for research on system-level multi-agent cooperation performance using graph information sharing. We conduct extensive experiments in car-following and unsignalized intersection settings. The results demonstrate that CAVs controlled by our method can achieve the best performance against several state of the art baselines.

LGJun 13, 2020
Inductive Graph Neural Networks for Spatiotemporal Kriging

Yuankai Wu, Dingyi Zhuang, Aurelie Labbe et al.

Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention has been paid to the kriging problem -- recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that: 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; 3) a trained model can be transferred to new graph structures and 4) IGNNK can be used to generate virtual sensors.

SOC-PHMay 10, 2020
Non-recurrent Traffic Congestion Detection with a Coupled Scalable Bayesian Robust Tensor Factorization Model

Qin Li, Huachun Tan, Xizhu Jiang et al.

Non-recurrent traffic congestion (NRTC) usually brings unexpected delays to commuters. Hence, it is critical to accurately detect and recognize the NRTC in a real-time manner. The advancement of road traffic detectors and loop detectors provides researchers with a large-scale multivariable temporal-spatial traffic data, which allows the deep research on NRTC to be conducted. However, it remains a challenging task to construct an analytical framework through which the natural spatial-temporal structural properties of multivariable traffic information can be effectively represented and exploited to better understand and detect NRTC. In this paper, we present a novel analytical training-free framework based on coupled scalable Bayesian robust tensor factorization (Coupled SBRTF). The framework can couple multivariable traffic data including traffic flow, road speed, and occupancy through sharing a similar or the same sparse structure. And, it naturally captures the high-dimensional spatial-temporal structural properties of traffic data by tensor factorization. With its entries revealing the distribution and magnitude of NRTC, the shared sparse structure of the framework compasses sufficiently abundant information about NRTC. While the low-rank part of the framework, expresses the distribution of general expected traffic condition as an auxiliary product. Experimental results on real-world traffic data show that the proposed method outperforms coupled Bayesian robust principal component analysis (coupled BRPCA), the rank sparsity tensor decomposition (RSTD), and standard normal deviates (SND) in detecting NRTC. The proposed method performs even better when only traffic data in weekdays are utilized, and hence can provide more precise estimation of NRTC for daily commuters.

ROApr 18, 2019
Efficient Motion Planning for Automated Lane Change based on Imitation Learning and Mixed-Integer Optimization

Chenyang Xi, Tianyu Shi, Yuankai Wu et al.

Intelligent motion planning is one of the core components in automated vehicles, which has received extensive interests. Traditional motion planning methods suffer from several drawbacks in terms of optimality, efficiency and generalization capability. Sampling based methods cannot guarantee the optimality of the generated trajectories. Whereas the optimization-based methods are not able to perform motion planning in real-time, and limited by the simplified formalization. In this work, we propose a learning-based approach to handle those shortcomings. Mixed Integer Quadratic Problem based optimization (MIQP) is used to generate the optimal lane-change trajectories which served as the training dataset for learning-based action generation algorithms. A hierarchical supervised learning model is devised to make the fast lane-change decision. Numerous experiments have been conducted to evaluate the optimality, efficiency, and generalization capability of the proposed approach. The experimental results indicate that the proposed model outperforms several commonly used motion planning baselines.

LGOct 25, 2018
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning

Yuankai Wu, Huachun Tan, Bin Ran

Variable speed limits (VSL) control is a flexible way to improve traffic condition,increase safety and reduce emission. There is an emerging trend of using reinforcement learning technique for VSL control and recent studies have shown promising results. Currently, deep learning is enabling reinforcement learning to develope autonomous control agents for problems that were previously intractable. In this paper, we propose a more effective deep reinforcement learning (DRL) model for differential variable speed limits (DVSL) control, in which the dynamic and different speed limits among lanes can be imposed. The proposed DRL models use a novel actor-critic architecture which can learn a large number of discrete speed limits in a continues action space. Different reward signals, e.g. total travel time, bottleneck speed, emergency braking, and vehicular emission are used to train the DVSL controller, and comparison between these reward signals are conducted. We test proposed DRL baased DVSL controllers on a simulated freeway recurrent bottleneck. Results show that the efficiency, safety and emissions can be improved by the proposed method. We also show some interesting findings through the visulization of the control policies generated from DRL models.

CVDec 3, 2016
Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework

Yuankai Wu, Huachun Tan

Deep learning approaches have reached a celebrity status in artificial intelligence field, its success have mostly relied on Convolutional Networks (CNN) and Recurrent Networks. By exploiting fundamental spatial properties of images and videos, the CNN always achieves dominant performance on visual tasks. And the Recurrent Networks (RNN) especially long short-term memory methods (LSTM) can successfully characterize the temporal correlation, thus exhibits superior capability for time series tasks. Traffic flow data have plentiful characteristics on both time and space domain. However, applications of CNN and LSTM approaches on traffic flow are limited. In this paper, we propose a novel deep architecture combined CNN and LSTM to forecast future traffic flow (CLTFP). An 1-dimension CNN is exploited to capture spatial features of traffic flow, and two LSTMs are utilized to mine the short-term variability and periodicities of traffic flow. Given those meaningful features, the feature-level fusion is performed to achieve short-term forecasting. The proposed CLTFP is compared with other popular forecasting methods on an open datasets. Experimental results indicate that the CLTFP has considerable advantages in traffic flow forecasting. in additional, the proposed CLTFP is analyzed from the view of Granger Causality, and several interesting properties of CLTFP are discovered and discussed .