Gao Cong

h-index68

30papers

1,179citations

Novelty50%

AI Score50

Ranked #19,169 of 194,257 authors (top 10%)#31 in DB (top 7%)

30 Papers

32.5AIApr 13, 2023

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

Gengchen Mai, Weiming Huang, Jin Sun et al. · stanford

Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.

34.0LGOct 9, 2023Code

Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis

Zezhi Shao, Fei Wang, Yongjun Xu et al.

Multivariate Time Series (MTS) analysis is crucial to understanding and managing complex systems, such as traffic and energy systems, and a variety of approaches to MTS forecasting have been proposed recently. However, we often observe inconsistent or seemingly contradictory performance findings across different studies. This hinders our understanding of the merits of different approaches and slows down progress. We address the need for means of assessing MTS forecasting proposals reliably and fairly, in turn enabling better exploitation of MTS as seen in different applications. Specifically, we first propose BasicTS+, a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions. BasicTS+ establishes a unified training pipeline and reasonable settings, enabling an unbiased evaluation. Second, we identify the heterogeneity across different MTS as an important consideration and enable classification of MTS based on their temporal and spatial characteristics. Disregarding this heterogeneity is a prime reason for difficulties in selecting the most promising technical directions. Third, we apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions. This provides readers with an overall picture of the cutting-edge research on MTS forecasting. The code can be accessed at https://github.com/GestaltCogTeam/BasicTS.

14.3SEMay 10Code

MACAA: Belief-Revision Multi-Agent Reasoning for Open-World Code Authorship Verification

Jingwei Ye, Zhi Wang, Xin Li et al.

Code authorship attribution (CAA) supports software forensics, plagiarism detection, and intellectual property protection. However, existing supervised CAA approaches suffer from scarce training data and closed-world assumptions: they require sufficient labeled code from fixed candidate-author sets, making training difficult in low-data cases and predictions unreliable for open-world test pairs with unseen samples, or heterogeneous code pairs. Large language models remove task-specific training, but direct prompting depends on costly expert-designed prompts, can hallucinate over complex heterogeneous code pairs, and rarely yields auditable evidence traces. We propose MACAA, a belief-revision-based multi-agent framework for training-free code authorship verification. MACAA comprises a Coordinator and four Expert Agents analyzing layout, lexical, syntactic, and programming-pattern evidence. The Coordinator gathers expert signals for expansion, discounts unreliable evidence through contraction, and resolves conflicts through revision to preserve belief consistency, replacing direct LLM judgment with auditable hypothesis refinement. MACAA achieves 89.15\% F1 on same-language benchmarks and 80.00\% on mixed cross-language pairs, surpassing all baselines.

16.0CVNov 15, 2022Code

Region Embedding with Intra and Inter-View Contrastive Learning

Liang Zhang, Cheng Long, Gao Cong

Unsupervised region representation learning aims to extract dense and effective features from unlabeled urban data. While some efforts have been made for solving this problem based on multiple views, existing methods are still insufficient in extracting representations in a view and/or incorporating representations from different views. Motivated by the success of contrastive learning for representation learning, we propose to leverage it for multi-view region representation learning and design a model called ReMVC (Region Embedding with Multi-View Contrastive Learning) by following two guidelines: i) comparing a region with others within each view for effective representation extraction and ii) comparing a region with itself across different views for cross-view information sharing. We design the intra-view contrastive learning module which helps to learn distinguished region embeddings and the inter-view contrastive learning module which serves as a soft co-regularizer to constrain the embedding parameters and transfer knowledge across multi-views. We exploit the learned region embeddings in two downstream tasks named land usage clustering and region popularity prediction. Extensive experiments demonstrate that our model achieves impressive improvements compared with seven state-of-the-art baseline methods, and the margins are over 30% in the land usage clustering task.

7.3DBFeb 28, 2023

WISK: A Workload-aware Learned Index for Spatial Keyword Queries

Yufan Sheng, Xin Cao, Yixiang Fang et al.

Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.

7.3DBNov 12, 2022Code

Online Anomalous Subtrajectory Detection on Road Networks with Deep Reinforcement Learning

Qianru Zhang, Zheng Wang, Cheng Long et al.

Detecting anomalous trajectories has become an important task in many location-based applications. While many approaches have been proposed for this task, they suffer from various issues including (1) incapability of detecting anomalous subtrajectories, which are finer-grained anomalies in trajectory data, and/or (2) non-data driven, and/or (3) requirement of sufficient supervision labels which are costly to collect. In this paper, we propose a novel reinforcement learning based solution called RL4OASD, which avoids all aforementioned issues of existing approaches. RL4OASD involves two networks, one responsible for learning features of road networks and trajectories and the other responsible for detecting anomalous subtrajectories based on the learned features, and the two networks can be trained iteratively without labeled data. Extensive experiments are conducted on two real datasets, and the results show that our solution can significantly outperform the state-of-the-art methods (with 20-30% improvement) and is efficient for online detection (it takes less than 0.1ms to process each newly generated data point).

8.7LGOct 14, 2022

Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

Tiantian He, Haicang Zhou, Yew-Soon Ong et al.

Graph attention networks (GATs) are powerful tools for analyzing graph data from various real-world scenarios. To learn representations for downstream tasks, GATs generally attend to all neighbors of the central node when aggregating the features. In this paper, we show that a large portion of the neighbors are irrelevant to the central nodes in many real-world graphs, and can be excluded from neighbor aggregation. Taking the cue, we present Selective Attention (SA) and a series of novel attention mechanisms for graph neural networks (GNNs). SA leverages diverse forms of learnable node-node dissimilarity to acquire the scope of attention for each node, from which irrelevant neighbors are excluded. We further propose Graph selective attention networks (SATs) to learn representations from the highly correlated node features identified and investigated by different SA mechanisms. Lastly, theoretical analysis on the expressive power of the proposed SATs and a comprehensive empirical study of the SATs on challenging real-world datasets against state-of-the-art GNNs are presented to demonstrate the effectiveness of SATs.

4.3DBSep 23, 2024Code

CAMAL: Optimizing LSM-trees via Active Learning

Weiping Yu, Siqiang Luo, Zihao Yu et al.

We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.

12.5AIAug 22, 2024

Self-Supervised Representation Learning for Geospatial Objects: A Survey

Yile Chen, Weiming Huang, Kaiqi Zhao et al.

The proliferation of various data sources in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across a wide range of geospatial applications. However, geospatial data, which is inherently linked to geospatial objects, often exhibits data heterogeneity that necessitates specialized fusion and representation strategies while simultaneously being inherently sparse in labels for downstream tasks. Consequently, there is a growing demand for techniques that can effectively leverage geospatial data without heavy reliance on task-specific labels and model designs. This need aligns with the principles of self-supervised learning (SSL), which has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision. This paper presents a comprehensive and up-to-date survey of SSL techniques specifically applied to or developed for geospatial objects in three primary vector geometric types: Point, Polyline, and Polygon. We systematically categorize various SSL techniques into predictive and contrastive methods, and analyze their adaptation to different data types for representation learning across various downstream tasks. Furthermore, we examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models. Finally, we discuss key challenges in current research and outline promising directions for future investigation. By offering a structured analysis of existing studies, this paper aims to inspire continued progress in integrating SSL with geospatial objects, and the development of geospatial foundation models in a longer term.

13.8DBOct 1, 2023

City Foundation Models for Learning General Purpose Representations from OpenStreetMap

Pasquale Balsebre, Weiming Huang, Gao Cong et al.

Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.

0.5CLJan 25, 2023

Improving the Inference of Topic Models via Infinite Latent State Replications

Daniel Rugeles, Zhen Hai, Juan Felipe Carmona et al.

In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.

18.1DBSep 13, 2021Code

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Yuxing Han, Ziniu Wu, Peizhi Wu et al.

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.

22.7LGFeb 6, 2024

AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction

Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.

Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.

7.9AIDec 22, 2023

AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction

Tangwen Qian, Yile Chen, Gao Cong et al.

Multi-agent trajectory prediction, as a critical task in modeling complex interactions of objects in dynamic systems, has attracted significant research attention in recent years. Despite the promising advances, existing studies all follow the assumption that data distribution observed during model learning matches that encountered in real-world deployments. However, this assumption often does not hold in practice, as inherent distribution shifts might exist in the mobility patterns for deployment environments, thus leading to poor domain generalization and performance degradation. Consequently, it is appealing to leverage trajectories from multiple source domains to mitigate such discrepancies for multi-agent trajectory prediction task. However, the development of multi-source domain generalization in this task presents two notable issues: (1) negative transfer; (2) inadequate modeling for external factors. To address these issues, we propose a new causal formulation to explicitly model four types of features: domain-invariant and domain-specific features for both the focal agent and neighboring agents. Building upon the new formulation, we propose AdapTraj, a multi-source domain generalization framework specifically tailored for multi-agent trajectory prediction. AdapTraj serves as a plug-and-play module that is adaptable to a variety of models. Extensive experiments on four datasets with different domains demonstrate that AdapTraj consistently outperforms other baselines by a substantial margin.

9.2LGMar 18, 2024

Semantic-Enhanced Representation Learning for Road Networks with Temporal Dynamics

Yile Chen, Xiucheng Li, Gao Cong et al.

In this study, we introduce a novel framework called Toast for learning general-purpose representations of road networks, along with its advanced counterpart DyToast, designed to enhance the integration of temporal dynamics to boost the performance of various time-sensitive downstream tasks. Specifically, we propose to encode two pivotal semantic characteristics intrinsic to road networks: traffic patterns and traveling semantics. To achieve this, we refine the skip-gram module by incorporating auxiliary objectives aimed at predicting the traffic context associated with a target road segment. Moreover, we leverage trajectory data and design pre-training strategies based on Transformer to distill traveling semantics on road networks. DyToast further augments this framework by employing unified trigonometric functions characterized by their beneficial properties, enabling the capture of temporal evolution and dynamic nature of road networks more effectively. With these proposed techniques, we can obtain representations that encode multi-faceted aspects of knowledge within road networks, applicable across both road segment-based applications and trajectory-based applications. Extensive experiments on two real-world datasets across three tasks demonstrate that our proposed framework consistently outperforms the state-of-the-art baselines by a significant margin.

5.9DBNov 1, 2024

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Ziting Wang, Haitao Yuan, Wei Dong et al.

Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.

7.3AIOct 17, 2024

Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Tangwen Qian, Junhe Li, Yile Chen et al.

Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.

12.0CLJun 8, 2025

Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization

Yile Chen, Yicheng Tao, Yue Jiang et al.

The widespread adoption of location-based services has led to the generation of vast amounts of mobility data, providing significant opportunities to model user movement dynamics within urban environments. Recent advancements have focused on adapting Large Language Models (LLMs) for mobility analytics. However, existing methods face two primary limitations: inadequate semantic representation of locations (i.e., discrete IDs) and insufficient modeling of mobility signals within LLMs (i.e., single templated instruction fine-tuning). To address these issues, we propose QT-Mob, a novel framework that significantly enhances LLMs for mobility analytics. QT-Mob introduces a location tokenization module that learns compact, semantically rich tokens to represent locations, preserving contextual information while ensuring compatibility with LLMs. Furthermore, QT-Mob incorporates a series of complementary fine-tuning objectives that align the learned tokens with the internal representations in LLMs, improving the model's comprehension of sequential movement patterns and location semantics. The proposed QT-Mob framework not only enhances LLMs' ability to interpret mobility data but also provides a more generalizable approach for various mobility analytics tasks. Experiments on three real-world dataset demonstrate the superior performance in both next-location prediction and mobility recovery tasks, outperforming existing deep learning and LLM-based methods.

8.3CLJun 25, 2025

A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs

Kethmi Hirushini Hettige, Jiahao Ji, Cheng Long et al.

Spatio-temporal data mining plays a pivotal role in informed decision making across diverse domains. However, existing models are often restricted to narrow tasks, lacking the capacity for multi-task inference and complex long-form reasoning that require generation of in-depth, explanatory outputs. These limitations restrict their applicability to real-world, multi-faceted decision scenarios. In this work, we introduce STReason, a novel framework that integrates the reasoning strengths of large language models (LLMs) with the analytical capabilities of spatio-temporal models for multi-task inference and execution. Without requiring task-specific finetuning, STReason leverages in-context learning to decompose complex natural language queries into modular, interpretable programs, which are then systematically executed to generate both solutions and detailed rationales. To facilitate rigorous evaluation, we construct a new benchmark dataset and propose a unified evaluation framework with metrics specifically designed for long-form spatio-temporal reasoning. Experimental results show that STReason significantly outperforms advanced LLM baselines across all metrics, particularly excelling in complex, reasoning-intensive spatio-temporal scenarios. Human evaluations further validate STReason's credibility and practical utility, demonstrating its potential to reduce expert workload and broaden the applicability to real-world spatio-temporal tasks. We believe STReason provides a promising direction for developing more capable and generalizable spatio-temporal reasoning systems.

2.6LGOct 30, 2024Code

FlexTSF: A Flexible Forecasting Model for Time Series with Variable Regularities

Jingge Xiao, Yile Chen, Gao Cong et al.

Forecasting time series with irregular temporal structures remains challenging for universal pre-trained models. Existing approaches often assume regular sampling or depend heavily on imputation, limiting their applicability in real-world scenarios where irregularities are prevalent due to diverse sensing devices and recording practices. We introduce FlexTSF, a flexible forecasting model specifically designed for time series data with variable temporal regularities. At its foundation lies the IVP Patcher, a continuous-time patching module leveraging Initial Value Problems (IVPs) to inherently support uneven time intervals, variable sequence lengths, and missing values. FlexTSF employs a decoder-only architecture that integrates normalized timestamp inputs and domain-specific statistics through a specialized causal self-attention mechanism, enabling adaptability across domains. Extensive experiments on 16 datasets demonstrate FlexTSF's effectiveness, significantly outperforming existing models in classic forecasting scenarios, zero-shot generalization, and low-resource fine-tuning conditions. Ablation studies confirm the contributions of each design component and the advantage of not relying on predefined fixed patch lengths.

12.5LGJun 18, 2024Code

SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting

Yue Jiang, Xiucheng Li, Yile Chen et al.

Time series forecasting is essential for our daily activities and precise modeling of the complex correlations and shared patterns among multiple time series is essential for improving forecasting performance. Spatial-Temporal Graph Neural Networks (STGNNs) are widely used in multivariate time series forecasting tasks and have achieved promising performance on multiple real-world datasets for their ability to model the underlying complex spatial and temporal dependencies. However, existing studies have mainly focused on datasets comprising only a few hundred sensors due to the heavy computational cost and memory cost of spatial-temporal GNNs. When applied to larger datasets, these methods fail to capture the underlying complex spatial dependencies and exhibit limited scalability and performance. To this end, we present a Scalable Adaptive Graph Diffusion Forecasting Network (SAGDFN) to capture complex spatial-temporal correlation for large-scale multivariate time series and thereby, leading to exceptional performance in multivariate time series forecasting tasks. The proposed SAGDFN is scalable to datasets of thousands of nodes without the need of prior knowledge of spatial correlation. Extensive experiments demonstrate that SAGDFN achieves comparable performance with state-of-the-art baselines on one real-world dataset of 207 nodes and outperforms all state-of-the-art baselines by a significant margin on three real-world datasets of 2000 nodes.

8.2CLMar 14, 2024Code

LAMP: A Language Model on the Map

Pasquale Balsebre, Weiming Huang, Gao Cong

Large Language Models (LLMs) are poised to play an increasingly important role in our lives, providing assistance across a wide array of tasks. In the geospatial domain, LLMs have demonstrated the ability to answer generic questions, such as identifying a country's capital; nonetheless, their utility is hindered when it comes to answering fine-grained questions about specific places, such as grocery stores or restaurants, which constitute essential aspects of people's everyday lives. This is mainly because the places in our cities haven't been systematically fed into LLMs, so as to understand and memorize them. This study introduces a novel framework for fine-tuning a pre-trained model on city-specific data, to enable it to provide accurate recommendations, while minimizing hallucinations. We share our model, LAMP, and the data used to train it. We conduct experiments to analyze its ability to correctly retrieving spatial objects, and compare it to well-known open- and closed- source language models, such as GPT-4. Finally, we explore its emerging capabilities through a case study on day planning.

6.2AIFeb 28, 2022

Points-of-Interest Relationship Inference with Spatial-enriched Graph Neural Networks

Yile Chen, Xiucheng Li, Gao Cong et al.

As a fundamental component in location-based services, inferring the relationship between points-of-interests (POIs) is very critical for service providers to offer good user experience to business owners and customers. Most of the existing methods for relationship inference are not targeted at POI, thus failing to capture unique spatial characteristics that have huge effects on POI relationships. In this work we propose PRIM to tackle POI relationship inference for multiple relation types. PRIM features four novel components, including a weighted relational graph neural network, category taxonomy integration, a self-attentive spatial context extractor, and a distance-specific scoring function. Extensive experiments on two real-world datasets show that PRIM achieves the best results compared to state-of-the-art baselines and it is robust against data sparsity and is applicable to unseen cases in practice.

4.3DBMar 8, 2021

A Reinforcement Learning Based R-Tree for Spatial Data Indexing in Dynamic Environments

Tu Gu, Kaiyu Feng, Gao Cong et al.

Learned indices have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indices and query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to improve on the query performance of the classic R-Tree without the need of changing its structure or query processing algorithms. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building an R-Tree, instead of relying on hand-crafted heuristic rules currently used by R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects clearly show that our RL based index outperforms R-Tree and its variants in terms of query processing time.

3.0IRNov 20, 2020

Exploring Global Information for Session-based Recommendation

Ziyang Wang, Wei Wei, Gao Cong et al.

Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Most existing SBR studies model the user preferences based only on the current session while neglecting the item-transition information from the other sessions, which suffer from the inability of modeling the complicated item-transition pattern. To address the limitations, we introduce global item-transition information to strength the modeling of the dynamic item-transition. For fully exploiting the global item-transition information, two ways of exploring global information for SBR are studied in this work. Specifically, we first propose a basic GNN-based framework (BGNN), which solely uses session-level item-transition information on session graph. Based on BGNN, we propose a novel approach, called Session-based Recommendation with Global Information (SRGI), which infers the user preferences via fully exploring global item-transitions over all sessions from two different perspectives: (i) Fusion-based Model (SRGI-FM), which recursively incorporates the neighbor embeddings of each node on global graph into the learning process of session level item representation; and (ii) Constrained-based Model (SRGI-CM), which treats the global-level item-transition information as a constraint to ensure the learned item embeddings are consistent with the global item-transition. Extensive experiments conducted on three popular benchmark datasets demonstrate that both SRGI-FM and SRGI-CM outperform the state-of-the-art methods consistently.

4.3DBMar 5, 2020

Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning

Zheng Wang, Cheng Long, Gao Cong et al.

Similar trajectory search is a fundamental problem and has been well studied over the past two decades. However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded despite that it could capture trajectory similarity in a finer-grained way and many applications take subtrajectories as basic units for analysis. In this paper, we study the SimSub problem and develop a suite of algorithms including both exact and approximate ones. Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency. We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.

0.8LGDec 17, 2018

Representation Learning for Spatial Graphs

Zheng Wang, Ce Ju, Gao Cong et al.

Recently, the topic of graph representation learning has received plenty of attention. Existing approaches usually focus on structural properties only and thus they are not sufficient for those spatial graphs where the nodes are associated with some spatial information. In this paper, we present the first deep learning approach called s2vec for learning spatial graph representations, which is based on denoising autoencoders framework (DAF). We evaluate the learned representations on real datasets and the results verified the effectiveness of s2vec when used for spatial clustering.

27.6IRSep 5, 2018

HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems

Lucas Vinh Tran, Yi Tay, Shuai Zhang et al.

This paper investigates the notion of learning user and item representations in non-Euclidean space. Specifically, we study the connection between metric learning in hyperbolic space and collaborative filtering by exploring Mobius gyrovector spaces where the formalism of the spaces could be utilized to generalize the most common Euclidean vector operations. Overall, this work aims to bridge the gap between Euclidean and hyperbolic geometry in recommender systems through metric learning approach. We propose HyperML (Hyperbolic Metric Learning), a conceptually simple but highly effective model for boosting the performance. Via a series of extensive experiments, we show that our proposed HyperML not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in hyperbolic geometry.

14.3AIApr 12, 2018

Interact and Decide: Medley of Sub-Attention Networks for Effective Group Recommendation

Lucas Vinh Tran, Tuan-Anh Nguyen Pham, Yi Tay et al.

This paper proposes Medley of Sub-Attention Networks (MoSAN), a new novel neural architecture for the group recommendation task. Group-level recommendation is known to be a challenging task, in which intricate group dynamics have to be considered. As such, this is to be contrasted with the standard recommendation problem where recommendations are personalized with respect to a single user. Our proposed approach hinges upon the key intuition that the decision making process (in groups) is generally dynamic, i.e., a user's decision is highly dependent on the other group members. All in all, our key motivation manifests in a form of an attentive neural model that captures fine-grained interactions between group members. In our MoSAN model, each sub-attention module is representative of a single member, which models a user's preference with respect to all other group members. Subsequently, a Medley of Sub-Attention modules is then used to collectively make the group's final decision. Overall, our proposed model is both expressive and effective. Via a series of extensive experiments, we show that MoSAN not only achieves state-of-the-art performance but also improves standard baselines by a considerable margin.

0.8LGFeb 19, 2018Code

Heron Inference for Bayesian Graphical Models

Daniel Rugeles, Zhen Hai, Gao Cong et al.

Bayesian graphical models have been shown to be a powerful tool for discovering uncertainty and causal structure from real-world data in many application fields. Current inference methods primarily follow different kinds of trade-offs between computational complexity and predictive accuracy. At one end of the spectrum, variational inference approaches perform well in computational efficiency, while at the other end, Gibbs sampling approaches are known to be relatively accurate for prediction in practice. In this paper, we extend an existing Gibbs sampling method, and propose a new deterministic Heron inference (Heron) for a family of Bayesian graphical models. In addition to the support for nontrivial distributability, one more benefit of Heron is that it is able to not only allow us to easily assess the convergence status but also largely improve the running efficiency. We evaluate Heron against the standard collapsed Gibbs sampler and state-of-the-art state augmentation method in inference for well-known graphical models. Experimental results using publicly available real-life data have demonstrated that Heron significantly outperforms the baseline methods for inferring Bayesian graphical models.