LGFeb 20, 2023Code
Because Every Sensor Is Unique, so Is Every Pair: Handling Dynamicity in Traffic ForecastingArian Prabowo, Wei Shao, Hao Xue et al.
Traffic forecasting is a critical task to extract values from cyber-physical infrastructures, which is the backbone of smart transportation. However owing to external contexts, the dynamics at each sensor are unique. For example, the afternoon peaks at sensors near schools are more likely to occur earlier than those near residential areas. In this paper, we first analyze real-world traffic data to show that each sensor has a unique dynamic. Further analysis also shows that each pair of sensors also has a unique dynamic. Then, we explore how node embedding learns the unique dynamics at every sensor location. Next, we propose a novel module called Spatial Graph Transformers (SGT) where we use node embedding to leverage the self-attention mechanism to ensure that the information flow between two sensors is adaptive with respect to the unique dynamic of each pair. Finally, we present Graph Self-attention WaveNet (G-SWaN) to address the complex, non-linear spatiotemporal traffic dynamics. Through empirical experiments on four real-world, open datasets, we show that the proposed method achieves superior performance on both traffic speed and flow forecasting. Code is available at: https://github.com/aprbw/G-SWaN
45.9AIJun 4
An Infectious Disease Spread Simulation Based on Large Language Model Decision MakingYonchanok Khaokaew, Ruochen Kong, Andreas Zufle et al.
Modelling individual decision-making during infectious disease outbreaks is crucial for understanding behavioural dynamics and informing effective public health interventions. Prior work has shown that large language models can simulate realistic human behaviour by generating agent decisions based on demographic prompts and situational context. We build on this foundation with a spatially grounded, agent-based simulation framework that integrates LLM-generated decisions about self-reported influenza-like illness into a census-based synthetic population of agents. Location is treated as a central feature: agents are assigned to spatial units within cities, capturing the spatial distributions of different demographic groups using real-world census data and enabling geographically diverse behavioural modelling. We implement and compare three decision scenarios, independent reasoning, household influence, and message framing, and simulate self-reporting outcomes in San Francisco and Atlanta. Results reveal that income and education are the dominant drivers of reporting rate variation, with smaller but consistent effects from geography, LLM model choice, and message framing. Our framework generates synthetic data that captures both social and geographic heterogeneity, supporting spatial epidemiological modelling and bias-aware behavioural analysis.
LGSep 11, 2022
Leveraging Language Foundation Models for Human Mobility ForecastingHao Xue, Bhanu Prakash Voutharoja, Flora D. Salim
In this paper, we propose a novel pipeline that leverages language foundation models for temporal sequential pattern mining, such as for human mobility forecasting tasks. For example, in the task of predicting Place-of-Interest (POI) customer flows, typically the number of visits is extracted from historical logs, and only the numerical data are used to predict visitor flows. In this research, we perform the forecasting task directly on the natural language input that includes all kinds of information such as numerical values and contextual semantic information. Specific prompts are introduced to transform numerical temporal sequences into sentences so that existing language models can be directly applied. We design an AuxMobLCast pipeline for predicting the number of visitors in each POI, integrating an auxiliary POI category classification task with the encoder-decoder architecture. This research provides empirical evidence of the effectiveness of the proposed AuxMobLCast pipeline to discover sequential patterns in mobility forecasting tasks. The results, evaluated on three real-world datasets, demonstrate that pre-trained language foundation models also have good performance in forecasting temporal sequences. This study could provide visionary insights and lead to new research directions for predicting human mobility.
75.0LGMay 29
GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose MonitoringZechen Li, Keerthana Natarajan, Weizhi Zhang et al.
Continuous glucose monitoring (CGM) provides a dense view of daily metabolic physiology, yet existing generic time-series and CGM-specific foundation models often encode glucose traces as entangled single-stream sequences, leaving the distinct temporal structure of glycemic dynamics only implicitly modeled. We present GlucoFM, a lightweight CGM foundation model that aligns irregular recordings to a 24-hour chronological grid, preserves observation masks, and decomposes glucose dynamics into slow physiological state and transient event streams, capturing low-frequency glycemic baselines and short-term deviations that may reflect acute physiological responses or sensor artifacts. GlucoFM is pretrained on 109,066 hours of unlabeled CGM recordings from 477 subjects with two complementary objectives: masked contextual latent prediction over fused daily representations and temporal dynamics prediction over state and event streams. Across four diverse cohorts and seven clinical prediction tasks, GlucoFM achieves the strongest subject-disjoint linear-probing performance among evaluated baselines, improving average PR-AUC by 4.1 points over the best CGM-specific foundation model. Its gains are most pronounced on core metabolic outcomes, leading PR-AUC on all diabetes-risk and $β$-cell dysfunction tasks and on 3 of 4 insulin-resistance tasks. GlucoFM also achieves the best overall cross-dataset transfer performance and strong few-shot adaptation among evaluated methods, and consistent gains when aggregating multiple days for subject-level prediction, highlighting physiology-aware decomposition as an effective inductive bias for transferable CGM representation learning.
MESep 20, 2022
PromptCast: A New Prompt-based Learning Paradigm for Time Series ForecastingHao Xue, Flora D. Salim
This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.
CVJul 31, 2022
COCOA: Cross Modality Contrastive Learning for Sensor DataShohreh Deldari, Hao Xue, Aaqib Saeed et al.
Self-Supervised Learning (SSL) is a new paradigm for learning discriminative representations without labelled data and has reached comparable or even state-of-the-art results in comparison to supervised counterparts. Contrastive Learning (CL) is one of the most well-known approaches in SSL that attempts to learn general, informative representations of data. CL methods have been mostly developed for applications in computer vision and natural language processing where only a single sensor modality is used. A majority of pervasive computing applications, however, exploit data from a range of different sensor modalities. While existing CL methods are limited to learning from one or two data sources, we propose COCOA (Cross mOdality COntrastive leArning), a self-supervised model that employs a novel objective function to learn quality representations from multisensor data by computing the cross-correlation between different data modalities and minimizing the similarity between irrelevant instances. We evaluate the effectiveness of COCOA against eight recently introduced state-of-the-art self-supervised models, and two supervised baselines across five public datasets. We show that COCOA achieves superior classification performance to all other approaches. Also, COCOA is far more label-efficient than the other baselines including the fully supervised model using only one-tenth of available labelled data.
LGJun 6, 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal DataShohreh Deldari, Hao Xue, Aaqib Saeed et al.
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Acquiring annotated data can be a difficult and costly process. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models using supervisory signals that have been freely obtained from the raw data. Unlike existing reviews of SSRL that have pre-dominately focused upon methods in the fields of CV or NLP for a single modality, we aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data. To this end, we 1) provide a comprehensive categorization of existing SSRL methods, 2) introduce a generic pipeline by defining the key components of a SSRL framework, 3) compare existing models in terms of their objective function, network architecture and potential applications, and 4) review existing multimodal techniques in each category and various modalities. Finally, we present existing weaknesses and future opportunities. We believe our work develops a perspective on the requirements of SSRL in domains that utilise multimodal and/or temporal data
LGJul 13, 2022
Modeling Long-term Dependencies and Short-term Correlations in Patient Journey Data with Temporal Attention Networks for Health PredictionYuxi Liu, Zhenhao Zhang, Antonio Jimeno Yepes et al.
Building models for health prediction based on Electronic Health Records (EHR) has become an active research area. EHR patient journey data consists of patient time-ordered clinical events/visits from patients. Most existing studies focus on modeling long-term dependencies between visits, without explicitly taking short-term correlations between consecutive visits into account, where irregular time intervals, incorporated as auxiliary information, are fed into health prediction models to capture latent progressive patterns of patient journeys. We present a novel deep neural network with four modules to take into account the contributions of various variables for health prediction: i) the Stacked Attention module strengthens the deep semantics in clinical events within each patient journey and generates visit embeddings, ii) the Short-Term Temporal Attention module models short-term correlations between consecutive visit embeddings while capturing the impact of time intervals within those visit embeddings, iii) the Long-Term Temporal Attention module models long-term dependencies between visit embeddings while capturing the impact of time intervals within those visit embeddings, iv) and finally, the Coupled Attention module adaptively aggregates the outputs of Short-Term Temporal Attention and Long-Term Temporal Attention modules to make health predictions. Experimental results on MIMIC-III demonstrate superior predictive accuracy of our model compared to existing state-of-the-art methods, as well as the interpretability and robustness of this approach. Furthermore, we found that modeling short-term correlations contributes to local priors generation, leading to improved predictive modeling of patient journeys.
LGJan 11, 2023
Multiple-level Point Embedding for Solving Human Trajectory Imputation with PredictionKyle K. Qin, Yongli Ren, Wei Shao et al.
Sparsity is a common issue in many trajectory datasets, including human mobility data. This issue frequently brings more difficulty to relevant learning tasks, such as trajectory imputation and prediction. Nowadays, little existing work simultaneously deals with imputation and prediction on human trajectories. This work plans to explore whether the learning process of imputation and prediction could benefit from each other to achieve better outcomes. And the question will be answered by studying the coexistence patterns between missing points and observed ones in incomplete trajectories. More specifically, the proposed model develops an imputation component based on the self-attention mechanism to capture the coexistence patterns between observations and missing points among encoder-decoder layers. Meanwhile, a recurrent unit is integrated to extract the sequential embeddings from newly imputed sequences for predicting the following location. Furthermore, a new implementation called Imputation Cycle is introduced to enable gradual imputation with prediction enhancement at multiple levels, which helps to accelerate the speed of convergence. The experimental results on three different real-world mobility datasets show that the proposed approach has significant advantages over the competitive baselines across both imputation and prediction tasks in terms of accuracy and stability.
LGJun 10, 2023
Continually learning out-of-distribution spatiotemporal data for robust energy forecastingArian Prabowo, Kaixuan Chen, Hao Xue et al.
Forecasting building energy usage is essential for promoting sustainability and reducing waste, as it enables building managers to optimize energy consumption and reduce costs. This importance is magnified during anomalous periods, such as the COVID-19 pandemic, which have disrupted occupancy patterns and made accurate forecasting more challenging. Forecasting energy usage during anomalous periods is difficult due to changes in occupancy patterns and energy usage behavior. One of the primary reasons for this is the shift in distribution of occupancy patterns, with many people working or learning from home. This has created a need for new forecasting methods that can adapt to changing occupancy patterns. Online learning has emerged as a promising solution to this challenge, as it enables building managers to adapt to changes in occupancy patterns and adjust energy usage accordingly. With online learning, models can be updated incrementally with each new data point, allowing them to learn and adapt in real-time. Another solution is to use human mobility data as a proxy for occupancy, leveraging the prevalence of mobile devices to track movement patterns and infer occupancy levels. Human mobility data can be useful in this context as it provides a way to monitor occupancy patterns without relying on traditional sensors or manual data collection methods. We have conducted extensive experiments using data from six buildings to test the efficacy of these approaches. However, deploying these methods in the real world presents several challenges.
LGDec 30, 2022
Detecting Change Intervals with Isolation Distributional KernelYang Cao, Ye Zhu, Kai Ming Ting et al.
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change-points in data streams with the tolerance of outliers. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
LGAug 19, 2023
Contrastive Learning-based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling using EHRsYuxi Liu, Zhenhao Zhang, Shaowen Qin et al.
Predicting the risk of in-hospital mortality from electronic health records (EHRs) has received considerable attention. Such predictions will provide early warning of a patient's health condition to healthcare professionals so that timely interventions can be taken. This prediction task is challenging since EHR data are intrinsically irregular, with not only many missing values but also varying time intervals between medical records. Existing approaches focus on exploiting the variable correlations in patient medical records to impute missing values and establishing time-decay mechanisms to deal with such irregularity. This paper presents a novel contrastive learning-based imputation-prediction network for predicting in-hospital mortality risks using EHR data. Our approach introduces graph analysis-based patient stratification modeling in the imputation process to group similar patients. This allows information of similar patients only to be used, in addition to personal contextual information, for missing value imputation. Moreover, our approach can integrate contrastive learning into the proposed network architecture to enhance patient representation learning and predictive performance on the classification task. Experiments on two real-world EHR datasets show that our approach outperforms the state-of-the-art approaches in both imputation and prediction tasks.
LGApr 19, 2023
Equalised Odds is not Equal Individual Odds: Post-processing for Group and Individual FairnessEdward A. Small, Kacper Sokol, Daniel Manning et al.
Group fairness is achieved by equalising prediction distributions between protected sub-populations; individual fairness requires treating similar individuals alike. These two objectives, however, are incompatible when a scoring model is calibrated through discontinuous probability functions, where individuals can be randomly assigned an outcome determined by a fixed probability. This procedure may provide two similar individuals from the same protected group with classification odds that are disparately different -- a clear violation of individual fairness. Assigning unique odds to each protected sub-population may also prevent members of one sub-population from ever receiving equal chances of a positive outcome to another, which we argue is another type of unfairness called individual odds. We reconcile all this by constructing continuous probability functions between group thresholds that are constrained by their Lipschitz constant. Our solution preserves the model's predictive power, individual fairness and robustness while ensuring group fairness.
85.5AIApr 19Code
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level OptimizationYuncheng Hua, Sion Weatherhead, Mehdi Jafari et al.
Automated simulator construction requires distributional fidelity, distinguishing it from generic code generation. We identify two failure modes in long-horizon LLM agents: contextual drift and optimization instability arising from conflating structural and parametric errors. We propose SOCIA-EVO, a dual-anchored evolutionary framework. SOCIA-EVO introduces: (1) a static blueprint to enforce empirical constraints; (2) a bi-level optimization to decouple structural refinement from parameter calibration; and (3) a self-curating Strategy Playbook that manages remedial hypotheses via Bayesian-weighted retrieval. By falsifying ineffective strategies through execution feedback, SOCIA-EVO achieves robust convergence, generating simulators that are statistically consistent with observational data. The code and data of SOCIA-EVO are available here: https://github.com/cruiseresearchgroup/SOCIA/tree/evo.
CLSep 15, 2023
MAPLE: Mobile App Prediction Leveraging Large Language Model EmbeddingsYonchanok Khaokaew, Hao Xue, Flora D. Salim
In recent years, predicting mobile app usage has become increasingly important for areas like app recommendation, user behaviour analysis, and mobile resource management. Existing models, however, struggle with the heterogeneous nature of contextual data and the user cold start problem. This study introduces a novel prediction model, Mobile App Prediction Leveraging Large Language Model Embeddings (MAPLE), which employs Large Language Models (LLMs) and installed app similarity to overcome these challenges. MAPLE utilises the power of LLMs to process contextual data and discern intricate relationships within it effectively. Additionally, we explore the use of installed app similarity to address the cold start problem, facilitating the modelling of user preferences and habits, even for new users with limited historical data. In essence, our research presents MAPLE as a novel, potent, and practical approach to app usage prediction, making significant strides in resolving issues faced by existing models. MAPLE stands out as a comprehensive and effective solution, setting a new benchmark for more precise and personalised app usage predictions. In tests on two real-world datasets, MAPLE surpasses contemporary models in both standard and cold start scenarios. These outcomes validate MAPLE's capacity for precise app usage predictions and its resilience against the cold start problem. This enhanced performance stems from the model's proficiency in capturing complex temporal patterns and leveraging contextual information. As a result, MAPLE can potentially improve personalised mobile app usage predictions and user experiences markedly.
LGAug 24, 2023
Hypergraph Convolutional Networks for Fine-grained ICU Patient Similarity Analysis and Risk PredictionYuxi Liu, Zhenhao Zhang, Shaowen Qin et al.
The Intensive Care Unit (ICU) is one of the most important parts of a hospital, which admits critically ill patients and provides continuous monitoring and treatment. Various patient outcome prediction methods have been attempted to assist healthcare professionals in clinical decision-making. Existing methods focus on measuring the similarity between patients using deep neural networks to capture the hidden feature structures. However, the higher-order relationships are ignored, such as patient characteristics (e.g., diagnosis codes) and their causal effects on downstream clinical predictions. In this paper, we propose a novel Hypergraph Convolutional Network that allows the representation of non-pairwise relationships among diagnosis codes in a hypergraph to capture the hidden feature structures so that fine-grained patient similarity can be calculated for personalized mortality risk prediction. Evaluation using a publicly available eICU Collaborative Research Database indicates that our method achieves superior performance over the state-of-the-art models on mortality risk prediction. Moreover, the results of several case studies demonstrated the effectiveness and robustness of the model decisions.
LGDec 7, 2022
SeqLink: A Robust Neural-ODE Architecture for Modelling Partially Observed Time SeriesFutoon M. Abushaqra, Hao Xue, Yongli Ren et al.
Ordinary Differential Equations (ODE) based models have become popular as foundation models for solving many time series problems. Combining neural ODEs with traditional RNN models has provided the best representation for irregular time series. However, ODE-based models typically require the trajectory of hidden states to be defined based on either the initial observed value or the most recent observation, raising questions about their effectiveness when dealing with longer sequences and extended time intervals. In this article, we explore the behaviour of the ODE models in the context of time series data with varying degrees of sparsity. We introduce SeqLink, an innovative neural architecture designed to enhance the robustness of sequence representation. Unlike traditional approaches that solely rely on the hidden state generated from the last observed value, SeqLink leverages ODE latent representations derived from multiple data samples, enabling it to generate robust data representations regardless of sequence length or data sparsity level. The core concept behind our model is the definition of hidden states for the unobserved values based on the relationships between samples (links between sequences). Through extensive experiments on partially observed synthetic and real-world datasets, we demonstrate that SeqLink improves the modelling of intermittent time series, consistently outperforming state-of-the-art approaches.
LGNov 11, 2022
Integrated Convolutional and Recurrent Neural Networks for Health Risk Prediction using Patient Journey Data with Many Missing ValuesYuxi Liu, Shaowen Qin, Antonio Jimeno Yepes et al.
Predicting the health risks of patients using Electronic Health Records (EHR) has attracted considerable attention in recent years, especially with the development of deep learning techniques. Health risk refers to the probability of the occurrence of a specific health outcome for a specific patient. The predicted risks can be used to support decision-making by healthcare professionals. EHRs are structured patient journey data. Each patient journey contains a chronological set of clinical events, and within each clinical event, there is a set of clinical/medical activities. Due to variations of patient conditions and treatment needs, EHR patient journey data has an inherently high degree of missingness that contains important information affecting relationships among variables, including time. Existing deep learning-based models generate imputed values for missing values when learning the relationships. However, imputed data in EHR patient journey data may distort the clinical meaning of the original EHR patient journey data, resulting in classification bias. This paper proposes a novel end-to-end approach to modeling EHR patient journey data with Integrated Convolutional and Recurrent Neural Networks. Our model can capture both long- and short-term temporal patterns within each patient journey and effectively handle the high degree of missingness in EHR data without any imputation data generation. Extensive experimental results using the proposed model on two real-world datasets demonstrate robust performance as well as superior prediction accuracy compared to existing state-of-the-art imputation-based prediction methods.
LGSep 8, 2023
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learningArian Prabowo, Kaixuan Chen, Hao Xue et al.
In traditional deep learning algorithms, one of the key assumptions is that the data distribution remains constant during both training and deployment. However, this assumption becomes problematic when faced with Out-of-Distribution periods, such as the COVID-19 lockdowns, where the data distribution significantly deviates from what the model has seen during training. This paper employs a two-fold strategy: utilizing continual learning techniques to update models with new data and harnessing human mobility data collected from privacy-preserving pedestrian counters located outside buildings. In contrast to online learning, which suffers from 'catastrophic forgetting' as newly acquired knowledge often erases prior information, continual learning offers a holistic approach by preserving past insights while integrating new data. This research applies FSNet, a powerful continual learning algorithm, to real-world data from 13 building complexes in Melbourne, Australia, a city which had the second longest total lockdown duration globally during the pandemic. Results underscore the crucial role of continual learning in accurate energy forecasting, particularly during Out-of-Distribution periods. Secondary data such as mobility and temperature provided ancillary support to the primary forecasting model. More importantly, while traditional methods struggled to adapt during lockdowns, models featuring at least online learning demonstrated resilience, with lockdown periods posing fewer challenges once armed with adaptive learning techniques. This study contributes valuable methodologies and insights to the ongoing effort to improve energy load forecasting during future Out-of-Distribution periods.
LGSep 8, 2023
Counterfactual Explanations via Locally-guided Sequential Algorithmic RecourseEdward A. Small, Jeffrey N. Clark, Christopher J. McWilliams et al.
Counterfactuals operationalised through algorithmic recourse have become a powerful tool to make artificial intelligence systems explainable. Conceptually, given an individual classified as y -- the factual -- we seek actions such that their prediction becomes the desired class y' -- the counterfactual. This process offers algorithmic recourse that is (1) easy to customise and interpret, and (2) directly aligned with the goals of each individual. However, the properties of a "good" counterfactual are still largely debated; it remains an open challenge to effectively locate a counterfactual along with its corresponding recourse. Some strategies use gradient-driven methods, but these offer no guarantees on the feasibility of the recourse and are open to adversarial attacks on carefully created manifolds. This can lead to unfairness and lack of robustness. Other methods are data-driven, which mostly addresses the feasibility problem at the expense of privacy, security and secrecy as they require access to the entire training data set. Here, we introduce LocalFACE, a model-agnostic technique that composes feasible and actionable counterfactual explanations using locally-acquired information at each step of the algorithmic recourse. Our explainer preserves the privacy of users by only leveraging data that it specifically requires to construct actionable algorithmic recourse, and protects the model by offering transparency solely in the regions deemed necessary for the intervention.
82.7LGMay 11Code
TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory GenerationWilson Wongso, Lihuan Li, Arian Prabowo et al.
Generating high-fidelity synthetic GPS trajectories is increasingly important for applications in transportation, urban planning, and what-if scenario simulation, especially as privacy concerns limit access to real-world mobility data. Existing trajectory generation models face a trade-off between efficiency and faithfulness to road network topology: continuous-space methods enable fast generation but ignore the road network, while topology-aware approaches rely on search-based autoregressive decoding that limits generation speed. We propose TrajDLM, a topology-aware trajectory generation framework based on block diffusion language models that bridges this gap. TrajDLM models trajectories as sequences of discrete road segments, combining a block diffusion backbone for efficient denoising, topology-aware embeddings from a road network encoder, and topology-constrained sampling to ensure coherent and realistic trajectories. Across three city-scale datasets, TrajDLM achieves strong performance on fine-grained local similarity metrics while being up to $2.8\times$ faster than prior work, and demonstrates strong zero-shot transfer across domains, including unseen transportation modes. These results highlight the effectiveness of block-wise discrete diffusion as a scalable approach to accurate and efficient trajectory generation. Our code is available at https://github.com/cruiseresearchgroup/TrajDLM/
LGNov 7, 2025
QuAnTS: Question Answering on Time SeriesFelix Divo, Maurice Kraus, Anh Q. Nguyen et al.
Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhance accessibility and decision-making. While the creation of question-answering datasets and models has recently seen remarkable growth, most research focuses on question answering (QA) on vision and text, with time series receiving minute attention. To bridge this gap, we propose a challenging novel time series QA (TSQA) dataset, QuAnTS, for Question Answering on Time Series data. Specifically, we pose a wide variety of questions and answers about human motion in the form of tracked skeleton trajectories. We verify that the large-scale QuAnTS dataset is well-formed and comprehensive through extensive experiments. Thoroughly evaluating existing and newly proposed baselines then lays the groundwork for a deeper exploration of TSQA using QuAnTS. Additionally, we provide human performances as a key reference for gauging the practical usability of such models. We hope to encourage future research on interacting with time series models through text, enabling better decision-making and more transparent systems.
LGFeb 17
UrbanVerse: Learning Urban Region Representation Across Cities and TasksFengze Sun, Egemen Tanin, Shanika Karunasekera et al.
Recent advances in urban region representation learning have enabled a wide range of applications in urban analytics, yet existing methods remain limited in their capabilities to generalize across cities and analytic tasks. We aim to generalize urban representation learning beyond city- and task-specific settings, towards a foundation-style model for urban analytics. To this end, we propose UrbanVerse, a model for cross-city urban representation learning and cross-task urban analytics. For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city. We model regions as nodes on a graph, which enables a random walk-based procedure to form "sequences of regions" that reflect both local and neighborhood structural features for urban region representation learning. For cross-task generalization, we propose a cross-task learning module named HCondDiffCT. This module integrates region-conditioned prior knowledge and task-conditioned semantics into the diffusion process to jointly model multiple downstream urban prediction tasks. HCondDiffCT is generic. It can also be integrated with existing urban representation learning models to enhance their downstream task effectiveness. Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings, achieving up to 35.89% improvements in prediction accuracy.
AIOct 26, 2023
Utilizing Language Models for Energy Load ForecastingHao Xue, Flora D. Salim
Energy load forecasting plays a crucial role in optimizing resource allocation and managing energy consumption in buildings and cities. In this paper, we propose a novel approach that leverages language models for energy load forecasting. We employ prompting techniques to convert energy consumption data into descriptive sentences, enabling fine-tuning of language models. By adopting an autoregressive generating approach, our proposed method enables predictions of various horizons of future energy load consumption. Through extensive experiments on real-world datasets, we demonstrate the effectiveness and accuracy of our proposed method. Our results indicate that utilizing language models for energy load forecasting holds promise for enhancing energy efficiency and facilitating intelligent decision-making in energy systems.
LGJul 26, 2024
WorkR: Occupation Inference for Intelligent Task AssistanceYonchanok Khaokaew, Hao Xue, Mohammad Saiedur Rahaman et al.
Occupation information can be utilized by digital assistants to provide occupation-specific personalized task support, including interruption management, task planning, and recommendations. Prior research in the digital workplace assistant domain requires users to input their occupation information for effective support. However, as many individuals switch between multiple occupations daily, current solutions falter without continuous user input. To address this, this study introduces WorkR, a framework that leverages passive sensing to capture pervasive signals from various task activities, addressing three challenges: the lack of a passive sensing architecture, personalization of occupation characteristics, and discovering latent relationships among occupation variables. We argue that signals from application usage, movements, social interactions, and the environment can inform a user's occupation. WorkR uses a Variational Autoencoder (VAE) to derive latent features for training models to infer occupations. Our experiments with an anonymized, context-rich activity and task log dataset demonstrate that our models can accurately infer occupations with more than 91% accuracy across six ISO occupation categories.
AIFeb 2
DomusFM: A Foundation Model for Smart-Home Sensor DataMichele Fiori, Gabriele Civitarese, Flora D. Salim et al.
Smart-home sensor data holds significant potential for several applications, including healthcare monitoring and assistive technologies. Existing approaches, however, face critical limitations. Supervised models require impractical amounts of labeled data. Foundation models for activity recognition focus only on inertial sensors, failing to address the unique characteristics of smart-home binary sensor events: their sparse, discrete nature combined with rich semantic associations. LLM-based approaches, while tested in this domain, still raise several issues regarding the need for natural language descriptions or prompting, and reliance on either external services or expensive hardware, making them infeasible in real-life scenarios due to privacy and cost concerns. We introduce DomusFM, the first foundation model specifically designed and pretrained for smart-home sensor data. DomusFM employs a self-supervised dual contrastive learning paradigm to capture both token-level semantic attributes and sequence-level temporal dependencies. By integrating semantic embeddings from a lightweight language model and specialized encoders for temporal patterns and binary states, DomusFM learns generalizable representations that transfer across environments and tasks related to activity and event analysis. Through leave-one-dataset-out evaluation across seven public smart-home datasets, we demonstrate that DomusFM outperforms state-of-the-art baselines on different downstream tasks, achieving superior performance even with only 5% of labeled training data available for fine-tuning. Our approach addresses data scarcity while maintaining practical deployability for real-world smart-home systems.
CLOct 14, 2024Code
SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity RecognitionZechen Li, Shohreh Deldari, Linyao Chen et al.
We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor time-series data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where the model aligns sensor inputs with trend descriptions. Special tokens are introduced to mark channel boundaries. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying durations, without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through human-intuitive Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis. Our codes are available at https://github.com/zechenli03/SensorLLM.
47.3CLApr 16
Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal UncertaintyChao Xue, Yao Wang, Mengqiao Liu et al.
Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary computational costs for tasks amenable to fast, direct inference. Second, existing approaches primarily rely on voting-based mechanisms to evaluate CoT outputs, which often lack granularity and precision in assessing reasoning quality. In this paper, we propose E-GRM, an efficient generative reward modeling framework grounded in model-internal uncertainty. E-GRM leverages the convergence behavior of parallel model generations to estimate uncertainty and selectively trigger CoT reasoning only when needed, without relying on handcrafted features or task-dependent signals. To improve reward fidelity, we introduce a lightweight discriminative scorer trained with a hybrid regression--ranking objective to provide fine-grained evaluation of reasoning paths. Experiments on multiple reasoning benchmarks show that E-GRM substantially reduces inference cost while consistently improving answer accuracy, demonstrating that model-internal uncertainty is an effective and general signal for efficient reasoning-aware reward modeling.
81.8LGMay 18
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual LearningYang Liu, Toan Nguyen, Flora D. Salim
Catastrophic forgetting remains a major obstacle to continual learning in large language models (LLMs) and vision--language models (VLMs). Although Mixture-of-Experts (MoE) architectures offer an efficient path to scaling, existing LoRA-based MoE continual learning methods still face a fundamental trade-off: they either isolate experts too aggressively, limiting knowledge transfer across tasks, or allow task-specific updates to overwrite important existing parameters, leading to severe forgetting. To address this, we propose CP-MoE, a continual learning framework built around a transient expert that captures early task-specific updates and guides their integration into stable experts. CP-MoE introduces a consistency-preserving routing bias, which uses the transient expert to estimate representation similarity with stable experts and steer routing towards more compatible expert selection, and a transient expert-guided regularisation mechanism, which selectively protects important historical parameters during merging. Together, these components reduce parameter interference and forgetting while preserving cross-task knowledge transfer. We validate CP-MoE on both unimodal and multimodal continual learning benchmarks with LLM-based and VLM-based MoE models. On SuperNI benchmark, spanning diverse sequential language tasks, CP-MoE achieves state-of-the-art performance and stronger zero-shot transfer to unseen tasks. On VQA v2 dataset, it scales effectively to multimodal visual reasoning, consistently reduces forgetting, and outperforms strong MoE baselines.
66.9CLApr 16
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language ModelsChao Xue, Yao Wang, Mengqiao Liu et al.
Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models often fail to correctly reproduce a subset of their own supervised training data. We refer to this behavior as the Incomplete Learning Phenomenon(ILP). This paper presents the first systematic study of ILP in LLM fine-tuning. We formalize ILP as post-training failure to internalize supervised instances and demonstrate its prevalence across multiple model families, domains, and datasets. Through controlled analyses, we identify five recurrent sources of incomplete learning: (1) missing prerequisite knowledge in the pre-trained model, (2) conflicts between SFT supervision and pre-training knowledge, (3) internal inconsistencies within SFT data, (4) left-side forgetting during sequential fine-tuning, and (5) insufficient optimization for rare or complex patterns. We introduce a diagnostic-first framework that maps unlearned samples to these causes using observable training and inference signals, and study several targeted mitigation strategies as causal interventions. Experiments on Qwen, LLaMA, and OLMo2 show that incomplete learning is widespread and heterogeneous, and that improvements in aggregate metrics can mask persistent unlearned subsets. The findings highlight the need for fine-grained diagnosis of what supervised fine-tuning fails to learn, and why.
CLOct 2, 2023
Human Mobility Question Answering (Vision Paper)Hao Xue, Flora D. Salim
Question answering (QA) systems have attracted much attention from the artificial intelligence community as they can learn to answer questions based on the given knowledge source (e.g., images in visual question answering). However, the research into question answering systems with human mobility data remains unexplored. Mining human mobility data is crucial for various applications such as smart city planning, pandemic management, and personalised recommendation system. In this paper, we aim to tackle this gap and introduce a novel task, that is, human mobility question answering (MobQA). The aim of the task is to let the intelligent system learn from mobility data and answer related questions. This task presents a new paradigm change in mobility prediction research and further facilitates the research of human mobility recommendation systems. To better support this novel research topic, this vision paper also proposes an initial design of the dataset and a potential deep learning model framework for the introduced MobQA task. We hope that this paper will provide novel insights and open new directions in human mobility research and question answering research.
CLAug 6, 2025Code
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM AgentsZechen Li, Baiyu Chen, Hao Xue et al.
Motion sensor time-series are central to human activity recognition (HAR), with applications in health, sports, and smart devices. However, existing methods are trained for fixed activity sets and require costly retraining when new behaviours or sensor setups appear. Recent attempts to use large language models (LLMs) for HAR, typically by converting signals into text or images, suffer from limited accuracy and lack verifiable interpretability. We propose ZARA, the first agent-based framework for zero-shot, explainable HAR directly from raw motion time-series. ZARA integrates an automatically derived pair-wise feature knowledge base that captures discriminative statistics for every activity pair, a multi-sensor retrieval module that surfaces relevant evidence, and a hierarchical agent pipeline that guides the LLM to iteratively select features, draw on this evidence, and produce both activity predictions and natural-language explanations. ZARA enables flexible and interpretable HAR without any fine-tuning or task-specific classifiers. Extensive experiments on 8 HAR benchmarks show that ZARA achieves SOTA zero-shot performance, delivering clear reasoning while exceeding the strongest baselines by 2.53x in macro F1. Ablation studies further confirm the necessity of each module, marking ZARA as a promising step toward trustworthy, plug-and-play motion time-series analysis. Our codes are available at https://github.com/zechenli03/ZARA.
CRJul 19, 2024
AuditNet: A Conversational AI-based Security Assistant [DEMO]Shohreh Deldari, Mohammad Goudarzi, Aditya Joshi et al.
In the age of information overload, professionals across various fields face the challenge of navigating vast amounts of documentation and ever-evolving standards. Ensuring compliance with standards, regulations, and contractual obligations is a critical yet complex task across various professional fields. We propose a versatile conversational AI assistant framework designed to facilitate compliance checking on the go, in diverse domains, including but not limited to network infrastructure, legal contracts, educational standards, environmental regulations, and government policies. By leveraging retrieval-augmented generation using large language models, our framework automates the review, indexing, and retrieval of relevant, context-aware information, streamlining the process of verifying adherence to established guidelines and requirements. This AI assistant not only reduces the manual effort involved in compliance checks but also enhances accuracy and efficiency, supporting professionals in maintaining high standards of practice and ensuring regulatory compliance in their respective fields. We propose and demonstrate AuditNet, the first conversational AI security assistant designed to assist IoT network security experts by providing instant access to security standards, policies, and regulations.
CLDec 6, 2023Code
Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit CommunitiesAaron J. Snoswell, Lucinda Nelson, Hao Xue et al.
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language models for misogyny, and that it is sensitive enough to reveal the known differences in these Reddit communities. Our preliminary findings highlight the limitations of a generic approach to evaluating harms, and further emphasise the need for careful benchmark design and selection in natural language evaluation.
LGFeb 21Code
SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE RolloutsJiayi Li, Zhaonan Wang, Flora D. Salim
Neural operators provide fast PDE surrogates and often generalize across parameters and resolutions. However, in the short train long test setting, autoregressive rollouts can become unstable. This typically happens for two reasons: one step errors accumulate over time, and high frequency components feed back and grow. We introduce the Spectral Generator Neural Operator (SGNO), a residual time stepper that targets both effects. For the linear part, SGNO uses an exponential time differencing update in Fourier space with a learned diagonal generator. We constrain the real part of this generator to be nonpositive, so iterating the step does not amplify the linear dynamics. For nonlinear dynamics, SGNO adds a gated forcing term with channel mixing within each Fourier mode, which keeps the nonlinear update controlled. To further limit high frequency feedback, SGNO applies spectral truncation and an optional smooth mask on the forcing pathway. We derive a one step amplification bound and a finite horizon rollout error bound. The bound separates generator approximation error from nonlinear mismatch and gives sufficient conditions under which the latent $L^2$ norm does not grow across rollout steps. On APEBench spanning 1D, 2D, and 3D PDE families, SGNO achieves lower long horizon error and longer stable rollout lengths than strong neural operator baselines. Ablations confirm the roles of the generator constraint, gating, and filtering.The code is available at https://github.com/lijy32123-cloud/SGNO.
CLFeb 17, 2025Code
RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration ExemplarsYuncheng Hua, Lizhen Qu, Zhuang Li et al.
Alignment tuning is crucial for ensuring large language models (LLMs) behave ethically and helpfully. Current alignment approaches require high-quality annotations and significant training resources. This paper proposes a low-cost, tuning-free method using in-context learning (ICL) to enhance LLM alignment. Through an analysis of high-quality ICL demos, we identified style as a key factor influencing LLM alignment capabilities and explicitly restyled ICL exemplars based on this stylistic framework. Additionally, we combined the restyled demos to achieve a balance between the two conflicting aspects of LLM alignment--factuality and safety. We packaged the restyled examples as prompts to trigger few-shot learning, improving LLM alignment. Compared to the best baseline approach, with an average score of 5.00 as the maximum, our method achieves a maximum 0.10 increase on the Alpaca task (from 4.50 to 4.60), a 0.22 enhancement on the Just-eval benchmark (from 4.34 to 4.56), and a maximum improvement of 0.32 (from 3.53 to 3.85) on the MT-Bench dataset. We release the code and data at https://github.com/AnonymousCode-ComputerScience/RIDE.
LGOct 24, 2023
ZzzGPT: An Interactive GPT Approach to Enhance Sleep QualityYonchanok Khaokaew, Kaixin Ji, Thuc Hanh Nguyen et al.
This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alongside synthetic data generated by LLMs. The results highlight significant improvements, underlining the efficacy of merging advanced machine-learning techniques with a user-centric design ethos. Through this exploration, we bridge the gap between technological sophistication and user-friendly design, ensuring that our framework yields accurate predictions and translates them into actionable insights.
AIJul 25, 2024
Long-term Fairness in Ride-Hailing PlatformYufan Kang, Jeffrey Chan, Wei Shao et al.
Matching in two-sided markets such as ride-hailing has recently received significant attention. However, existing studies on ride-hailing mainly focus on optimising efficiency, and fairness issues in ride-hailing have been neglected. Fairness issues in ride-hailing, including significant earning differences between drivers and variance of passenger waiting times among different locations, have potential impacts on economic and ethical aspects. The recent studies that focus on fairness in ride-hailing exploit traditional optimisation methods and the Markov Decision Process to balance efficiency and fairness. However, there are several issues in these existing studies, such as myopic short-term decision-making from traditional optimisation and instability of fairness in a comparably longer horizon from both traditional optimisation and Markov Decision Process-based methods. To address these issues, we propose a dynamic Markov Decision Process model to alleviate fairness issues currently faced by ride-hailing, and seek a balance between efficiency and fairness, with two distinct characteristics: (i) a prediction module to predict the number of requests that will be raised in the future from different locations to allow the proposed method to consider long-term fairness based on the whole timeline instead of consider fairness only based on historical and current data patterns; (ii) a customised scalarisation function for multi-objective multi-agent Q Learning that aims to balance efficiency and fairness. Extensive experiments on a publicly available real-world dataset demonstrate that our proposed method outperforms existing state-of-the-art methods.
89.7CVMar 16
HyperTokens: Controlling Token Dynamics for Continual Video-Language UnderstandingToan Nguyen, Yang Liu, Celso De Melo et al.
Continual VideoQA with multimodal LLMs is hindered by interference between tasks and the prohibitive cost of storing task-specific prompts. We introduce HyperTokens, a transformer-based token generator that produces fine-tuning tokens on demand, giving explicit control over prompt updates while keeping memory fixed. To suppress forgetting, we propose meta-inspired regularisers that look ahead to avoid task-specific sharp directions and anchor the evolving generator to prior tasks. We further connect our objective to sharpness-aware optimisation, providing insight into why it encourages flatter cross-task minima and improves retention. Beyond regularisation, HyperTokens exploits lightweight auxiliary multimodal supervision through shared generation weights; guided by a causal perspective, we design feasible objectives and surrogate mutual-information losses to regularise anti-causal cross-modal directions. Across two standard continual VideoQA benchmarks, HyperTokens achieves higher average accuracy with substantially lower forgetting. Finally, we introduce a challenging cross-modal ImageQA->VideoQA protocol and show that HyperTokens enables robust continual transfer in this setting.
78.2AIMay 11
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge GraphsRuiyi Yang, Zechen Li, Hao Xue et al.
Self-evolving language-model agents must decide what to learn next and how to preserve what they have learned across iterations. Existing systems typically carry this cross-iteration knowledge as natural-language feedback, flat episodic memory, or implicit reinforcement signals, none of which cleanly supports a frozen weak backbone at inference time. This paper introduces MAGE (Multi-Agent Graph-guided Evolution), a framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph. Its experience subgraph stores both teacher-written failure corrections and the learner's own past correct reasoning traces, which are retrieved as task-conditioned guidance for a frozen execution model. During evolution, the graph, a task-level search bandit, and a skill-level routing bandit are updated from the same reward stream, while the learner's backbone remains unchanged. We further provide structural analysis showing how append-only memory growth, bounded curriculum coverage, and task-filtered retrieval together support stable improvement of the retrieval substrate for frozen-learner evolution. Across nine benchmarks spanning mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice, an open-world survival game, and web navigation, MAGE achieves strong performance against prompt-based frozen-backbone baselines. Ablations show that self-harvested success traces and teacher-written corrections are complementary, with success memories contributing most on reasoning-template-heavy tasks and corrective memories supporting harder composition and interaction settings.
83.4AIMay 11
Route by State, Recover from Trace: STAR with Failure-Aware Markov Routing for Multi-Agent Spatiotemporal ReasoningRuiyi Yang, Lihuan Li, Hao Xue et al.
Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.
LGSep 29, 2025Code
DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs LearningJiayi Li, Flora D. Salim
Learning PDE dynamics with neural solvers can significantly improve wall-clock efficiency and accuracy compared with classical numerical solvers. In recent years, foundation models for PDEs have largely adopted multi-scale windowed self-attention, with the scOT backbone in \textsc{Poseidon} serving as a representative example. However, because of their locality, truly globally consistent spectral coupling can only be propagated gradually through deep stacking and window shifting. This weakens global coupling and leads to error accumulation and drift during closed-loop rollouts. To address this, we propose \textbf{DRIFT-Net}. It employs a dual-branch design comprising a spectral branch and an image branch. The spectral branch is responsible for capturing global, large-scale low-frequency information, whereas the image branch focuses on local details and nonstationary structures. Specifically, we first perform controlled, lightweight mixing within the low-frequency range. Then we fuse the spectral and image paths at each layer via bandwise weighting, which avoids the width inflation and training instability caused by naive concatenation. The fused result is transformed back into the spatial domain and added to the image branch, thereby preserving both global structure and high-frequency details across scales. Compared with strong attention-based baselines, DRIFT-Net achieves lower error and higher throughput with fewer parameters under identical training settings and budget. On Navier--Stokes benchmarks, the relative $L_{1}$ error is reduced by 7\%--54\%, the parameter count decreases by about 15\%, and the throughput remains higher than scOT. Ablation studies and theoretical analyses further demonstrate the stability and effectiveness of this design. The code is available at https://github.com/cruiseresearchgroup/DRIFT-Net.
CVJun 24, 2025Code
Generate the Forest before the Trees -- A Hierarchical Diffusion model for Climate DownscalingDeclan J. Curran, Sanaa Hobeichi, Hira Saleem et al.
Downscaling is essential for generating the high-resolution climate data needed for local planning, but traditional methods remain computationally demanding. Recent years have seen impressive results from AI downscaling models, particularly diffusion models, which have attracted attention due to their ability to generate ensembles and overcome the smoothing problem common in other AI methods. However, these models typically remain computationally intensive. We introduce a Hierarchical Diffusion Downscaling (HDD) model, which introduces an easily-extensible hierarchical sampling process to the diffusion framework. A coarse-to-fine hierarchy is imposed via a simple downsampling scheme. HDD achieves competitive accuracy on ERA5 reanalysis datasets and CMIP6 models, significantly reducing computational load by running on up to half as many pixels with competitive results. Additionally, a single model trained at 0.25° resolution transfers seamlessly across multiple CMIP6 models with much coarser resolution. HDD thus offers a lightweight alternative for probabilistic climate downscaling, facilitating affordable large-ensemble high-resolution climate projections. See a full code implementation at: https://github.com/HDD-Hierarchical-Diffusion-Downscaling/HDD-Hierarchical-Diffusion-Downscaling.
AIJun 19, 2024Code
ViLCo-Bench: VIdeo Language COntinual learning BenchmarkTianqi Tang, Shohreh Deldari, Hao Xue et al.
Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github.com/cruiseresearchgroup/ViLCo.
LGJun 13, 2024Code
BTS: Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsArian Prabowo, Xiachong Lin, Imran Razzak et al.
Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics. Access to the dataset and the code used for benchmarking are available here: https://github.com/cruiseresearchgroup/DIEF_BTS .
LGFeb 16, 2025Code
PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGANJiayu Zhang, Zhiyu Zhu, Xinyi Wang et al.
Deep neural networks have demonstrated remarkable performance across various domains. However, they are vulnerable to adversarial examples, which can lead to erroneous predictions. Generative Adversarial Networks (GANs) can leverage the generators and discriminators model to quickly produce high-quality adversarial examples. Since both modules train in a competitive and simultaneous manner, GAN-based algorithms like AdvGAN can generate adversarial examples with better transferability compared to traditional methods. However, the generation of perturbations is usually limited to a single iteration, preventing these examples from fully exploiting the potential of the methods. To tackle this issue, we introduce a novel approach named Progressive Auto-Regression AdvGAN (PAR-AdvGAN). It incorporates an auto-regressive iteration mechanism within a progressive generation network to craft adversarial examples with enhanced attack capability. We thoroughly evaluate our PAR-AdvGAN method with a large-scale experiment, demonstrating its superior performance over various state-of-the-art black-box adversarial attacks, as well as the original AdvGAN.Moreover, PAR-AdvGAN significantly accelerates the adversarial example generation, i.e., achieving the speeds of up to 335.5 frames per second on Inception-v3 model, outperforming the gradient-based transferable attack algorithms. Our code is available at: https://github.com/LMBTough/PAR
LGNov 11, 2024Code
ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series ForecastingFutoon M. Abushaqra, Hao Xue, Yongli Ren et al.
Addressing the challenges of irregularity and concept drift in streaming time series is crucial for real-world predictive modelling. Previous studies in time series continual learning often propose models that require buffering long sequences, potentially restricting the responsiveness of the inference system. Moreover, these models are typically designed for regularly sampled data, an unrealistic assumption in real-world scenarios. This paper introduces ODEStream, a novel buffer-free continual learning framework that incorporates a temporal isolation layer to capture temporal dependencies within the data. Simultaneously, it leverages the capability of neural ordinary differential equations to process irregular sequences and generate a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario. Our approach focuses on learning how the dynamics and distribution of historical data change over time, facilitating direct processing of streaming sequences. Evaluations on benchmark real-world datasets demonstrate that ODEStream outperforms the state-of-the-art online learning and streaming analysis baseline models, providing accurate predictions over extended periods while minimising performance degradation over time by learning how the sequence dynamics change. The implementation of ODEStream is available at: https://github.com/FtoonAbushaqra/ODEStream.git.
LGMay 9, 2023Code
Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training (SCPT)Arian Prabowo, Hao Xue, Wei Shao et al.
New roads are being constructed all the time. However, the capabilities of previous deep forecasting models to generalize to new roads not seen in the training data (unseen roads) are rarely explored. In this paper, we introduce a novel setup called a spatio-temporal (ST) split to evaluate the models' capabilities to generalize to unseen roads. In this setup, the models are trained on data from a sample of roads, but tested on roads not seen in the training data. Moreover, we also present a novel framework called Spatial Contrastive Pre-Training (SCPT) where we introduce a spatial encoder module to extract latent features from unseen roads during inference time. This spatial encoder is pre-trained using contrastive learning. During inference, the spatial encoder only requires two days of traffic data on the new roads and does not require any re-training. We also show that the output from the spatial encoder can be used effectively to infer latent node embeddings on unseen roads during inference time. The SCPT framework also incorporates a new layer, named the spatially gated addition (SGA) layer, to effectively combine the latent features from the output of the spatial encoder to existing backbones. Additionally, since there is limited data on the unseen roads, we argue that it is better to decouple traffic signals to trivial-to-capture periodic signals and difficult-to-capture Markovian signals, and for the spatial encoder to only learn the Markovian signals. Finally, we empirically evaluated SCPT using the ST split setup on four real-world datasets. The results showed that adding SCPT to a backbone consistently improves forecasting performance on unseen roads. More importantly, the improvements are greater when forecasting further into the future. The codes are available on GitHub: https://github.com/cruiseresearchgroup/forecasting-on-new-roads .
LGDec 14, 2021Code
Event-Aware Multimodal Mobility NowcastingZhaonan Wang, Renhe Jiang, Hao Xue et al.
As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/underdoc-wang/EAST-Net.
IRApr 19, 2024
Large Language Models for Next Point-of-Interest RecommendationPeibo Li, Maarten de Rijke, Hao Xue et al.
The next Point of Interest (POI) recommendation task is to predict users' immediate next POI visit given their historical data. Location-Based Social Network (LBSN) data, which is often used for the next POI recommendation task, comes with challenges. One frequently disregarded challenge is how to effectively use the abundant contextual information present in LBSN data. Previous methods are limited by their numerical nature and fail to address this challenge. In this paper, we propose a framework that uses pretrained Large Language Models (LLMs) to tackle this challenge. Our framework allows us to preserve heterogeneous LBSN data in its original format, hence avoiding the loss of contextual information. Furthermore, our framework is capable of comprehending the inherent meaning of contextual information due to the inclusion of commonsense knowledge. In experiments, we test our framework on three real-world LBSN datasets. Our results show that the proposed framework outperforms the state-of-the-art models in all three datasets. Our analysis demonstrates the effectiveness of the proposed framework in using contextual information as well as alleviating the commonly encountered cold-start and short trajectory problems.