92.9DBMay 29Code
FDABench: A Benchmark for Data Agents on Analytical Queries over Heterogeneous DataZiting Wang, Shize Zhang, Haitao Yuan et al.
The growing demand for data-driven decision-making has created an urgent need for data agents that can reason over heterogeneous data (databases, documents, web content, images, videos, and audio) to answer complex analytical queries. However, evaluating such agents remains challenging: existing benchmarks often focus on isolated agent capabilities or limited data modalities, lacking comprehensive coverage of heterogeneous data and rigorous evaluation across diverse data agent architectures. To address these challenges, we present FDABench, a benchmark for evaluating data agents' reasoning ability over heterogeneous data in analytical scenarios. Our contributions are threefold: (1) A comprehensive benchmark of 2,007 tasks spanning six data modalities with a unified, multi-granularity evaluation framework. (2) We design PUDDING, an agentic dataset construction framework that leverages LLM generation with iterative expert validation for reliable and scalable benchmark construction. (3) Extensive experiments across diverse data agent architectures, including general analytical agents, semantic operator frameworks, and RAG-based methods, revealing key insights and guidelines for future data agent development. Our data and source code are released at https://github.com/fdabench/FDAbench.
AIApr 13, 2023
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial IntelligenceGengchen Mai, Weiming Huang, Jin Sun et al. · stanford
Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.
56.4LGJun 3Code
CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in ForecastingZhaoqi Zhang, Miao Xie, Yi Li et al.
As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph learning have improved POI forecasting, most methods rely on proximity-based graphs and correlation-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions. In this paper, we introduce a novel research problem -- cold-start POI check-in forecasting, which aims to predict the future check-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context. To address these challenges, we propose CausalPOI, a spatio-temporal graph-based causal representation learning framework. CausalPOI leverages Spatio-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios. Extensive experiments on real-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state-of-the-art baselines across the board, validating its effectiveness in spatio-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis. Source code is available at Github.
LGOct 9, 2023Code
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisZezhi Shao, Fei Wang, Yongjun Xu et al.
Multivariate Time Series (MTS) analysis is crucial to understanding and managing complex systems, such as traffic and energy systems, and a variety of approaches to MTS forecasting have been proposed recently. However, we often observe inconsistent or seemingly contradictory performance findings across different studies. This hinders our understanding of the merits of different approaches and slows down progress. We address the need for means of assessing MTS forecasting proposals reliably and fairly, in turn enabling better exploitation of MTS as seen in different applications. Specifically, we first propose BasicTS+, a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions. BasicTS+ establishes a unified training pipeline and reasonable settings, enabling an unbiased evaluation. Second, we identify the heterogeneity across different MTS as an important consideration and enable classification of MTS based on their temporal and spatial characteristics. Disregarding this heterogeneity is a prime reason for difficulties in selecting the most promising technical directions. Third, we apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions. This provides readers with an overall picture of the cutting-edge research on MTS forecasting. The code can be accessed at https://github.com/GestaltCogTeam/BasicTS.
CVNov 15, 2022
Region Embedding with Intra and Inter-View Contrastive LearningLiang Zhang, Cheng Long, Gao Cong
Unsupervised region representation learning aims to extract dense and effective features from unlabeled urban data. While some efforts have been made for solving this problem based on multiple views, existing methods are still insufficient in extracting representations in a view and/or incorporating representations from different views. Motivated by the success of contrastive learning for representation learning, we propose to leverage it for multi-view region representation learning and design a model called ReMVC (Region Embedding with Multi-View Contrastive Learning) by following two guidelines: i) comparing a region with others within each view for effective representation extraction and ii) comparing a region with itself across different views for cross-view information sharing. We design the intra-view contrastive learning module which helps to learn distinguished region embeddings and the inter-view contrastive learning module which serves as a soft co-regularizer to constrain the embedding parameters and transfer knowledge across multi-views. We exploit the learned region embeddings in two downstream tasks named land usage clustering and region popularity prediction. Extensive experiments demonstrate that our model achieves impressive improvements compared with seven state-of-the-art baseline methods, and the margins are over 30% in the land usage clustering task.
DBFeb 28, 2023
WISK: A Workload-aware Learned Index for Spatial Keyword QueriesYufan Sheng, Xin Cao, Yixiang Fang et al.
Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.
DBNov 12, 2022
Online Anomalous Subtrajectory Detection on Road Networks with Deep Reinforcement LearningQianru Zhang, Zheng Wang, Cheng Long et al.
Detecting anomalous trajectories has become an important task in many location-based applications. While many approaches have been proposed for this task, they suffer from various issues including (1) incapability of detecting anomalous subtrajectories, which are finer-grained anomalies in trajectory data, and/or (2) non-data driven, and/or (3) requirement of sufficient supervision labels which are costly to collect. In this paper, we propose a novel reinforcement learning based solution called RL4OASD, which avoids all aforementioned issues of existing approaches. RL4OASD involves two networks, one responsible for learning features of road networks and trajectories and the other responsible for detecting anomalous subtrajectories based on the learned features, and the two networks can be trained iteratively without labeled data. Extensive experiments are conducted on two real datasets, and the results show that our solution can significantly outperform the state-of-the-art methods (with 20-30% improvement) and is efficient for online detection (it takes less than 0.1ms to process each newly generated data point).
LGOct 14, 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised LearningTiantian He, Haicang Zhou, Yew-Soon Ong et al.
Graph attention networks (GATs) are powerful tools for analyzing graph data from various real-world scenarios. To learn representations for downstream tasks, GATs generally attend to all neighbors of the central node when aggregating the features. In this paper, we show that a large portion of the neighbors are irrelevant to the central nodes in many real-world graphs, and can be excluded from neighbor aggregation. Taking the cue, we present Selective Attention (SA) and a series of novel attention mechanisms for graph neural networks (GNNs). SA leverages diverse forms of learnable node-node dissimilarity to acquire the scope of attention for each node, from which irrelevant neighbors are excluded. We further propose Graph selective attention networks (SATs) to learn representations from the highly correlated node features identified and investigated by different SA mechanisms. Lastly, theoretical analysis on the expressive power of the proposed SATs and a comprehensive empirical study of the SATs on challenging real-world datasets against state-of-the-art GNNs are presented to demonstrate the effectiveness of SATs.
DBSep 23, 2024
CAMAL: Optimizing LSM-trees via Active LearningWeiping Yu, Siqiang Luo, Zihao Yu et al.
We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.
AIAug 22, 2024
Self-Supervised Representation Learning for Geospatial Objects: A SurveyYile Chen, Weiming Huang, Kaiqi Zhao et al.
The proliferation of various data sources in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across a wide range of geospatial applications. However, geospatial data, which is inherently linked to geospatial objects, often exhibits data heterogeneity that necessitates specialized fusion and representation strategies while simultaneously being inherently sparse in labels for downstream tasks. Consequently, there is a growing demand for techniques that can effectively leverage geospatial data without heavy reliance on task-specific labels and model designs. This need aligns with the principles of self-supervised learning (SSL), which has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision. This paper presents a comprehensive and up-to-date survey of SSL techniques specifically applied to or developed for geospatial objects in three primary vector geometric types: Point, Polyline, and Polygon. We systematically categorize various SSL techniques into predictive and contrastive methods, and analyze their adaptation to different data types for representation learning across various downstream tasks. Furthermore, we examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models. Finally, we discuss key challenges in current research and outline promising directions for future investigation. By offering a structured analysis of existing studies, this paper aims to inspire continued progress in integrating SSL with geospatial objects, and the development of geospatial foundation models in a longer term.
DBOct 1, 2023
City Foundation Models for Learning General Purpose Representations from OpenStreetMapPasquale Balsebre, Weiming Huang, Gao Cong et al.
Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
57.2LGMay 20
Beyond Single Slot: Joint Optimization for Multi-Slot Guaranteed Display AdvertisingZhaoqi Zhang, Jiaming Deng, Miao Xie et al.
Guaranteed display advertising is crucial for platform monetization, yet existing methods often operate under a single-slot assumption, limiting their ability to optimize allocation across multi-slot page views. In this paper, we propose a novel joint optimization framework for multi-slot GD allocation, addressing key challenges such as slot-level redundancy, contract imbalance, and exposure concentration. Our approach formulates the allocation as an offline bipartite matching problem with a contract roulette mechanism for slot exclusivity and Page View constraints for impression control, and incorporates a scalable allocation optimization algorithm for efficient large-scale deployment. Extensive online tests on the Meituan advertising platform demonstrate that our method significantly improves merchant ROI, platform revenue efficiency, and contract fulfillment robustness. Specifically, online A/B tests show a 28.99% increase in Average Revenue Per User under 70% traffic, and DID analysis further indicates improved contract stability, demonstrating the strong applicability and effectiveness of our framework in real-world advertising deployments.
CLJan 25, 2023
Improving the Inference of Topic Models via Infinite Latent State ReplicationsDaniel Rugeles, Zhen Hai, Juan Felipe Carmona et al.
In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.
LGFeb 13
Bus-Conditioned Zero-Shot Trajectory Generation via Task ArithmeticShuai Liu, Ning Cao, Yile Chen et al.
Mobility trajectory data provide essential support for smart city applications. However, such data are often difficult to obtain. Meanwhile, most existing trajectory generation methods implicitly assume that at least a subset of real mobility data from target city is available, which limits their applicability in data-inaccessible scenarios. In this work, we propose a new problem setting, called bus-conditioned zero-shot trajectory generation, where no mobility trajectories from a target city are accessible. The generation process relies solely on source city mobility data and publicly available bus timetables from both cities. Under this setting, we propose MobTA, the first approach to introduce task arithmetic into trajectory generation. MobTA models the parameter shift from bus-timetable-based trajectory generation to mobility trajectory generation in source city, and applies this shift to target city through arithmetic operations on task vectors. This enables trajectory generation that reflects target-city mobility patterns without requiring any real mobility data from it. Furthermore, we theoretically analyze MobTA's stability across base and instruction-tuned LLMs. Extensive experiments show that MobTA significantly outperforms existing methods, and achieves performance close to models finetuned using target city mobility trajectories.
DBSep 24, 2025Code
ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data ModalitiesJingyi Yang, Songsong Mo, Jiachen Shi et al.
The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing multimodal and real-time database systems, which either lack efficient ingestion and continuous query capability, or fall short in supporting expressive hybrid analytics. We introduce ARCADE, a real-time data system that efficiently supports high-throughput ingestion and expressive hybrid and continuous query processing across diverse data types. ARCADE introduces unified disk-based secondary index on LSM-based storage for vector, spatial, and text data modalities, a comprehensive cost-based query optimizer for hybrid queries, and an incremental materialized view framework for efficient continuous queries. Built on open-source RocksDB storage and MySQL query engine, ARCADE outperforms leading multimodal data systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.
LGOct 11, 2024Code
NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMsShuai Liu, Ning Cao, Yile Chen et al.
Next location prediction is a critical task in human mobility analysis.Existing methods typically formulate it as a classification task based on discrete location IDs, which hinders spatial continuity modeling and limits generalization to new cities. In this paper, we propose NextLocLLM, a novel framework that reformulates next-location prediction as coordinate regression and integrates LLMs for both location semantics encoding and coordinate-level prediction. To model location functional semantics, it constructs LLM-enhanced POI embeddings by leveraging language understanding capabilities of LLMs to extract functional semantics from textual descriptions of POI categories. These POI embeddings are combined with spatiotemporal trajectory representation and fed into the same LLM, enabling unified semantic and predictive modeling. A lightweight regression head generates coordinate outputs, which are mapped to top-k candidate locations via post-prediction retrieval module, ensuring structured outputs. Experiments across diverse cities show that NextLocLLM outperforms existing baselines in both supervised and zero-shot settings. Code is available at: https://github.com/liuwj2000/NexelocLLM.
DBSep 13, 2021Code
Cardinality Estimation in DBMS: A Comprehensive Benchmark EvaluationYuxing Han, Ziniu Wu, Peizhi Wu et al.
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.
LGFeb 6, 2024
AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality PredictionKethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.
Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.
DBApr 19, 2024
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query EfficiencyZhaodonghui Li, Haitao Yuan, Huiming Wang et al.
Query rewrite, which aims to generate more efficient queries by altering a SQL query's structure without changing the query result, has been an important research problem. In order to maintain equivalence between the rewritten query and the original one during rewriting, traditional query rewrite methods always rewrite the queries following certain rewrite rules. However, some problems still remain. Firstly, existing methods of finding the optimal choice or sequence of rewrite rules are still limited and the process always costs a lot of resources. Methods involving discovering new rewrite rules typically require complicated proofs of structural logic or extensive user interactions. Secondly, current query rewrite methods usually rely highly on DBMS cost estimators which are often not accurate. In this paper, we address these problems by proposing a novel method of query rewrite named LLM-R2, adopting a large language model (LLM) to propose possible rewrite rules for a database rewrite system. To further improve the inference ability of LLM in recommending rewrite rules, we train a contrastive model by curriculum to learn query representations and select effective query demonstrations for the LLM. Experimental results have shown that our method can significantly improve the query execution efficiency and outperform the baseline methods. In addition, our method enjoys high robustness across different datasets.
DBMar 12, 2025
Foundation Models for Spatio-Temporal Data Science: A Tutorial and SurveyYuxuan Liang, Haomin Wen, Yutong Xia et al.
Spatio-Temporal (ST) data science, which includes sensing, managing, and mining large-scale data across space and time, is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Traditional deep learning approaches have significantly advanced this field, particularly in the stage of ST data mining. However, these models remain task-specific and often require extensive labeled data. Inspired by the success of Foundation Models (FM), especially large language models, researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. Unlike prior architectures, STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach. Despite rapid progress, a systematic study of STFMs for ST data science remains lacking. This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence.
AIDec 22, 2023
AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory PredictionTangwen Qian, Yile Chen, Gao Cong et al.
Multi-agent trajectory prediction, as a critical task in modeling complex interactions of objects in dynamic systems, has attracted significant research attention in recent years. Despite the promising advances, existing studies all follow the assumption that data distribution observed during model learning matches that encountered in real-world deployments. However, this assumption often does not hold in practice, as inherent distribution shifts might exist in the mobility patterns for deployment environments, thus leading to poor domain generalization and performance degradation. Consequently, it is appealing to leverage trajectories from multiple source domains to mitigate such discrepancies for multi-agent trajectory prediction task. However, the development of multi-source domain generalization in this task presents two notable issues: (1) negative transfer; (2) inadequate modeling for external factors. To address these issues, we propose a new causal formulation to explicitly model four types of features: domain-invariant and domain-specific features for both the focal agent and neighboring agents. Building upon the new formulation, we propose AdapTraj, a multi-source domain generalization framework specifically tailored for multi-agent trajectory prediction. AdapTraj serves as a plug-and-play module that is adaptable to a variety of models. Extensive experiments on four datasets with different domains demonstrate that AdapTraj consistently outperforms other baselines by a substantial margin.
LGMar 18, 2024
Semantic-Enhanced Representation Learning for Road Networks with Temporal DynamicsYile Chen, Xiucheng Li, Gao Cong et al.
In this study, we introduce a novel framework called Toast for learning general-purpose representations of road networks, along with its advanced counterpart DyToast, designed to enhance the integration of temporal dynamics to boost the performance of various time-sensitive downstream tasks. Specifically, we propose to encode two pivotal semantic characteristics intrinsic to road networks: traffic patterns and traveling semantics. To achieve this, we refine the skip-gram module by incorporating auxiliary objectives aimed at predicting the traffic context associated with a target road segment. Moreover, we leverage trajectory data and design pre-training strategies based on Transformer to distill traveling semantics on road networks. DyToast further augments this framework by employing unified trigonometric functions characterized by their beneficial properties, enabling the capture of temporal evolution and dynamic nature of road networks more effectively. With these proposed techniques, we can obtain representations that encode multi-faceted aspects of knowledge within road networks, applicable across both road segment-based applications and trajectory-based applications. Extensive experiments on two real-world datasets across three tasks demonstrate that our proposed framework consistently outperforms the state-of-the-art baselines by a significant margin.
DBNov 1, 2024
CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented GenerationZiting Wang, Haitao Yuan, Wei Dong et al.
Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.
AIOct 17, 2024
Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised ModelsTangwen Qian, Junhe Li, Yile Chen et al.
Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.
CLJun 8, 2025
Enhancing Large Language Models for Mobility Analytics with Semantic Location TokenizationYile Chen, Yicheng Tao, Yue Jiang et al.
The widespread adoption of location-based services has led to the generation of vast amounts of mobility data, providing significant opportunities to model user movement dynamics within urban environments. Recent advancements have focused on adapting Large Language Models (LLMs) for mobility analytics. However, existing methods face two primary limitations: inadequate semantic representation of locations (i.e., discrete IDs) and insufficient modeling of mobility signals within LLMs (i.e., single templated instruction fine-tuning). To address these issues, we propose QT-Mob, a novel framework that significantly enhances LLMs for mobility analytics. QT-Mob introduces a location tokenization module that learns compact, semantically rich tokens to represent locations, preserving contextual information while ensuring compatibility with LLMs. Furthermore, QT-Mob incorporates a series of complementary fine-tuning objectives that align the learned tokens with the internal representations in LLMs, improving the model's comprehension of sequential movement patterns and location semantics. The proposed QT-Mob framework not only enhances LLMs' ability to interpret mobility data but also provides a more generalizable approach for various mobility analytics tasks. Experiments on three real-world dataset demonstrate the superior performance in both next-location prediction and mobility recovery tasks, outperforming existing deep learning and LLM-based methods.
AIMay 30, 2025
Mixture-of-Experts for Personalized and Semantic-Aware Next Location PredictionShuai Liu, Ning Cao, Yile Chen et al.
Next location prediction plays a critical role in understanding human mobility patterns. However, existing approaches face two core limitations: (1) they fall short in capturing the complex, multi-functional semantics of real-world locations; and (2) they lack the capacity to model heterogeneous behavioral dynamics across diverse user groups. To tackle these challenges, we introduce NextLocMoE, a novel framework built upon large language models (LLMs) and structured around a dual-level Mixture-of-Experts (MoE) design. Our architecture comprises two specialized modules: a Location Semantics MoE that operates at the embedding level to encode rich functional semantics of locations, and a Personalized MoE embedded within the Transformer backbone to dynamically adapt to individual user mobility patterns. In addition, we incorporate a history-aware routing mechanism that leverages long-term trajectory data to enhance expert selection and ensure prediction stability. Empirical evaluations across several real-world urban datasets show that NextLocMoE achieves superior performance in terms of predictive accuracy, cross-domain generalization, and interpretability
85.1DBApr 2
BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result CollectorZiqi Yin, Gao Cong, Kai Zeng et al.
Although Approximate Nearest Neighbor (ANN) search has been extensively studied, large-k ANN queries that aim to retrieve a large number of nearest neighbors remain underexplored, despite their numerous real-world applications. Existing ANN methods face significant performance degradation for such queries. In this work, we first investigate the reasons for the performance degradation of quantization-based ANN indexes: (1) the inefficiency of existing top-k collectors, which incurs significant overhead in candidate maintenance, and (2) the reduced pruning effectiveness of quantization methods, which leads to a costly re-ranking process. To address this, we propose a novel bucket-based result collector (BBC) to enhance the efficiency of existing quantization-based ANN indexes for large-k ANN queries. BBC introduces two key components: (1) a bucket-based result buffer that organizes candidates into buckets by their distances to the query. This design reduces ranking costs and improves cache efficiency, enabling high performance maintenance of a candidate superset and a lightweight final selection of top-k results. (2) two re-ranking algorithms tailored for different types of quantization methods, which accelerate their re-ranking process by reducing either the number of candidate objects to be re-ranked or cache misses. Extensive experiments on real-world datasets demonstrate that BBC accelerates existing quantization-based ANN methods by up to 3.8x at recall@k = 0.95 for large-k ANN queries.
LGAug 26, 2025
STRATA-TS: Selective Knowledge Transfer for Urban Time Series Forecasting with Retrieval-Guided ReasoningYue Jiang, Chenxi Liu, Yile Chen et al.
Urban forecasting models often face a severe data imbalance problem: only a few cities have dense, long-span records, while many others expose short or incomplete histories. Direct transfer from data-rich to data-scarce cities is unreliable because only a limited subset of source patterns truly benefits the target domain, whereas indiscriminate transfer risks introducing noise and negative transfer. We present STRATA-TS (Selective TRAnsfer via TArget-aware retrieval for Time Series), a framework that combines domain-adapted retrieval with reasoning-capable large models to improve forecasting in scarce data regimes. STRATA-TS employs a patch-based temporal encoder to identify source subsequences that are semantically and dynamically aligned with the target query. These retrieved exemplars are then injected into a retrieval-guided reasoning stage, where an LLM performs structured inference over target inputs and retrieved support. To enable efficient deployment, we distill the reasoning process into a compact open model via supervised fine-tuning. Extensive experiments on three parking availability datasets across Singapore, Nottingham, and Glasgow demonstrate that STRATA-TS consistently outperforms strong forecasting and transfer baselines, while providing interpretable knowledge transfer pathways.
MMAug 25, 2025
Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?Shuo Liu, Di Yao, Yan Lin et al.
Building a general model capable of analyzing human trajectories across different geographic regions and different tasks becomes an emergent yet important problem for various applications. However, existing works suffer from the generalization problem, \ie, they are either restricted to train for specific regions or only suitable for a few tasks. Given the recent advances of multimodal large language models (MLLMs), we raise the question: can MLLMs reform current trajectory data mining and solve the problem? Nevertheless, due to the modality gap of trajectory, how to generate task-independent multimodal trajectory representations and how to adapt flexibly to different tasks remain the foundational challenges. In this paper, we propose \texttt{Traj-MLLM}}, which is the first general framework using MLLMs for trajectory data mining. By integrating multiview contexts, \texttt{Traj-MLLM}} transforms raw trajectories into interleaved image-text sequences while preserving key spatial-temporal characteristics, and directly utilizes the reasoning ability of MLLMs for trajectory analysis. Additionally, a prompt optimization method is proposed to finalize data-invariant prompts for task adaptation. Extensive experiments on four publicly available datasets show that \texttt{Traj-MLLM}} outperforms state-of-the-art baselines by $48.05\%$, $15.52\%$, $51.52\%$, $1.83\%$ on travel time estimation, mobility prediction, anomaly detection and transportation mode identification, respectively. \texttt{Traj-MLLM}} achieves these superior performances without requiring any training data or fine-tuning the MLLM backbones.
CLJun 25, 2025
A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMsKethmi Hirushini Hettige, Jiahao Ji, Cheng Long et al.
Spatio-temporal data mining plays a pivotal role in informed decision making across diverse domains. However, existing models are often restricted to narrow tasks, lacking the capacity for multi-task inference and complex long-form reasoning that require generation of in-depth, explanatory outputs. These limitations restrict their applicability to real-world, multi-faceted decision scenarios. In this work, we introduce STReason, a novel framework that integrates the reasoning strengths of large language models (LLMs) with the analytical capabilities of spatio-temporal models for multi-task inference and execution. Without requiring task-specific finetuning, STReason leverages in-context learning to decompose complex natural language queries into modular, interpretable programs, which are then systematically executed to generate both solutions and detailed rationales. To facilitate rigorous evaluation, we construct a new benchmark dataset and propose a unified evaluation framework with metrics specifically designed for long-form spatio-temporal reasoning. Experimental results show that STReason significantly outperforms advanced LLM baselines across all metrics, particularly excelling in complex, reasoning-intensive spatio-temporal scenarios. Human evaluations further validate STReason's credibility and practical utility, demonstrating its potential to reduce expert workload and broaden the applicability to real-world spatio-temporal tasks. We believe STReason provides a promising direction for developing more capable and generalizable spatio-temporal reasoning systems.
LGMay 19, 2025
TransferTraj: A Vehicle Trajectory Learning Model for Region and Task TransferabilityTonglong Wei, Yan Lin, Zeyu Zhou et al.
Vehicle GPS trajectories provide valuable movement information that supports various downstream tasks and applications. A desirable trajectory learning model should be able to transfer across regions and tasks without retraining, avoiding the need to maintain multiple specialized models and subpar performance with limited training data. However, each region has its unique spatial features and contexts, which are reflected in vehicle movement patterns and difficult to generalize. Additionally, transferring across different tasks faces technical challenges due to the varying input-output structures required for each task. Existing efforts towards transferability primarily involve learning embedding vectors for trajectories, which perform poorly in region transfer and require retraining of prediction modules for task transfer. To address these challenges, we propose TransferTraj, a vehicle GPS trajectory learning model that excels in both region and task transferability. For region transferability, we introduce RTTE as the main learnable module within TransferTraj. It integrates spatial, temporal, POI, and road network modalities of trajectories to effectively manage variations in spatial context distribution across regions. It also introduces a TRIE module for incorporating relative information of spatial features and a spatial context MoE module for handling movement patterns in diverse contexts. For task transferability, we propose a task-transferable input-output scheme that unifies the input-output structure of different tasks into the masking and recovery of modalities and trajectory points. This approach allows TransferTraj to be pre-trained once and transferred to different tasks without retraining. Extensive experiments on three real-world vehicle trajectory datasets under task transfer, zero-shot, and few-shot region transfer, validating TransferTraj's effectiveness.
LGOct 30, 2024
FlexTSF: A Flexible Forecasting Model for Time Series with Variable RegularitiesJingge Xiao, Yile Chen, Gao Cong et al.
Forecasting time series with irregular temporal structures remains challenging for universal pre-trained models. Existing approaches often assume regular sampling or depend heavily on imputation, limiting their applicability in real-world scenarios where irregularities are prevalent due to diverse sensing devices and recording practices. We introduce FlexTSF, a flexible forecasting model specifically designed for time series data with variable temporal regularities. At its foundation lies the IVP Patcher, a continuous-time patching module leveraging Initial Value Problems (IVPs) to inherently support uneven time intervals, variable sequence lengths, and missing values. FlexTSF employs a decoder-only architecture that integrates normalized timestamp inputs and domain-specific statistics through a specialized causal self-attention mechanism, enabling adaptability across domains. Extensive experiments on 16 datasets demonstrate FlexTSF's effectiveness, significantly outperforming existing models in classic forecasting scenarios, zero-shot generalization, and low-resource fine-tuning conditions. Ablation studies confirm the contributions of each design component and the advantage of not relying on predefined fixed patch lengths.
LGOct 18, 2024
PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language ModelsTonglong Wei, Yan Lin, Youfang Lin et al.
Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.
LGJun 18, 2024
UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language ModelsYue Jiang, Qin Chao, Yile Chen et al.
Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) designed to tackle diverse problems in urban scenarios. UrbanLLM functions as a problem-solver by decomposing urban-related queries into manageable sub-tasks, identifying suitable spatio-temporal AI models for each sub-task, and generating comprehensive responses to the given queries. Our experimental results indicate that UrbanLLM significantly outperforms other established LLMs, such as Llama and the GPT series, in handling problems concerning complex urban activity planning and management. UrbanLLM exhibits considerable potential in enhancing the effectiveness of solving problems in urban scenarios, reducing the workload and reliance for human experts.
LGJun 18, 2024
SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series ForecastingYue Jiang, Xiucheng Li, Yile Chen et al.
Time series forecasting is essential for our daily activities and precise modeling of the complex correlations and shared patterns among multiple time series is essential for improving forecasting performance. Spatial-Temporal Graph Neural Networks (STGNNs) are widely used in multivariate time series forecasting tasks and have achieved promising performance on multiple real-world datasets for their ability to model the underlying complex spatial and temporal dependencies. However, existing studies have mainly focused on datasets comprising only a few hundred sensors due to the heavy computational cost and memory cost of spatial-temporal GNNs. When applied to larger datasets, these methods fail to capture the underlying complex spatial dependencies and exhibit limited scalability and performance. To this end, we present a Scalable Adaptive Graph Diffusion Forecasting Network (SAGDFN) to capture complex spatial-temporal correlation for large-scale multivariate time series and thereby, leading to exceptional performance in multivariate time series forecasting tasks. The proposed SAGDFN is scalable to datasets of thousands of nodes without the need of prior knowledge of spatial correlation. Extensive experiments demonstrate that SAGDFN achieves comparable performance with state-of-the-art baselines on one real-world dataset of 207 nodes and outperforms all state-of-the-art baselines by a significant margin on three real-world datasets of 2000 nodes.
LGJun 6, 2024
Road Network Representation Learning with the Third Law of GeographyHaicang Zhou, Weiming Huang, Yile Chen et al.
Road network representation learning aims to learn compressed and effective vectorized representations for road segments that are applicable to numerous tasks. In this paper, we identify the limitations of existing methods, particularly their overemphasis on the distance effect as outlined in the First Law of Geography. In response, we propose to endow road network representation with the principles of the recent Third Law of Geography. To this end, we propose a novel graph contrastive learning framework that employs geographic configuration-aware graph augmentation and spectral negative sampling, ensuring that road segments with similar geographic configurations yield similar representations, and vice versa, aligning with the principles stated in the Third Law. The framework further fuses the Third Law with the First Law through a dual contrastive learning objective to effectively balance the implications of both laws. We evaluate our framework on two real-world datasets across three downstream tasks. The results show that the integration of the Third Law significantly improves the performance of road segment representations in downstream tasks.
CLMar 14, 2024
LAMP: A Language Model on the MapPasquale Balsebre, Weiming Huang, Gao Cong
Large Language Models (LLMs) are poised to play an increasingly important role in our lives, providing assistance across a wide array of tasks. In the geospatial domain, LLMs have demonstrated the ability to answer generic questions, such as identifying a country's capital; nonetheless, their utility is hindered when it comes to answering fine-grained questions about specific places, such as grocery stores or restaurants, which constitute essential aspects of people's everyday lives. This is mainly because the places in our cities haven't been systematically fed into LLMs, so as to understand and memorize them. This study introduces a novel framework for fine-tuning a pre-trained model on city-specific data, to enable it to provide accurate recommendations, while minimizing hallucinations. We share our model, LAMP, and the data used to train it. We conduct experiments to analyze its ability to correctly retrieving spatial objects, and compare it to well-known open- and closed- source language models, such as GPT-4. Finally, we explore its emerging capabilities through a case study on day planning.
AIFeb 28, 2022
Points-of-Interest Relationship Inference with Spatial-enriched Graph Neural NetworksYile Chen, Xiucheng Li, Gao Cong et al.
As a fundamental component in location-based services, inferring the relationship between points-of-interests (POIs) is very critical for service providers to offer good user experience to business owners and customers. Most of the existing methods for relationship inference are not targeted at POI, thus failing to capture unique spatial characteristics that have huge effects on POI relationships. In this work we propose PRIM to tackle POI relationship inference for multiple relation types. PRIM features four novel components, including a weighted relational graph neural network, category taxonomy integration, a self-attentive spatial context extractor, and a distance-specific scoring function. Extensive experiments on two real-world datasets show that PRIM achieves the best results compared to state-of-the-art baselines and it is robust against data sparsity and is applicable to unseen cases in practice.
DBJul 26, 2021
A Unified Deep Model of Learning from both Data and Queries for Cardinality EstimationPeizhi Wu, Gao Cong
Cardinality estimation is a fundamental problem in database systems. To capture the rich joint data distributions of a relational table, most of the existing work either uses data as unsupervised information or uses query workload as supervised information. Very little work has been done to use both types of information, and cannot fully make use of both types of information to learn the joint data distribution. In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. First, to enable using the supervised query information in the deep autoregressive model, we develop differentiable progressive sampling using the Gumbel-Softmax trick. Second, UAE is able to utilize both types of information to learn the joint data distribution in a single model. Comprehensive experimental results demonstrate that UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.
IRJun 9, 2021
Global Context Enhanced Graph Neural Networks for Session-based RecommendationZiyang Wang, Wei Wei, Gao Cong et al.
Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Almost all the existing solutions for SBR model user preference only based on the current session without exploiting the other sessions, which may contain both relevant and irrelevant item-transitions to the current session. This paper proposes a novel approach, called Global Context Enhanced Graph Neural Networks (GCE-GNN) to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session. Specifically, GCE-GNN learns two levels of item embeddings from session graph and global graph, respectively: (i) Session graph, which is to learn the session-level item embedding by modeling pairwise item-transitions within the current session; and (ii) Global graph, which is to learn the global-level item embedding by modeling pairwise item-transitions over all sessions. In GCE-GNN, we propose a novel global-level item representation learning layer, which employs a session-aware attention mechanism to recursively incorporate the neighbors' embeddings of each node on the global graph. We also design a session-level item representation learning layer, which employs a GNN on the session graph to learn session-level item embeddings within the current session. Moreover, GCE-GNN aggregates the learnt item representations in the two levels with a soft attention mechanism. Experiments on three benchmark datasets demonstrate that GCE-GNN outperforms the state-of-the-art methods consistently.
DBMar 8, 2021
A Reinforcement Learning Based R-Tree for Spatial Data Indexing in Dynamic EnvironmentsTu Gu, Kaiyu Feng, Gao Cong et al.
Learned indices have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indices and query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to improve on the query performance of the classic R-Tree without the need of changing its structure or query processing algorithms. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building an R-Tree, instead of relying on hand-crafted heuristic rules currently used by R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects clearly show that our RL based index outperforms R-Tree and its variants in terms of query processing time.
IRNov 20, 2020
Exploring Global Information for Session-based RecommendationZiyang Wang, Wei Wei, Gao Cong et al.
Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Most existing SBR studies model the user preferences based only on the current session while neglecting the item-transition information from the other sessions, which suffer from the inability of modeling the complicated item-transition pattern. To address the limitations, we introduce global item-transition information to strength the modeling of the dynamic item-transition. For fully exploiting the global item-transition information, two ways of exploring global information for SBR are studied in this work. Specifically, we first propose a basic GNN-based framework (BGNN), which solely uses session-level item-transition information on session graph. Based on BGNN, we propose a novel approach, called Session-based Recommendation with Global Information (SRGI), which infers the user preferences via fully exploring global item-transitions over all sessions from two different perspectives: (i) Fusion-based Model (SRGI-FM), which recursively incorporates the neighbor embeddings of each node on global graph into the learning process of session level item representation; and (ii) Constrained-based Model (SRGI-CM), which treats the global-level item-transition information as a constraint to ensure the learned item embeddings are consistent with the global item-transition. Extensive experiments conducted on three popular benchmark datasets demonstrate that both SRGI-FM and SRGI-CM outperform the state-of-the-art methods consistently.
DBMar 5, 2020
Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement LearningZheng Wang, Cheng Long, Gao Cong et al.
Similar trajectory search is a fundamental problem and has been well studied over the past two decades. However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded despite that it could capture trajectory similarity in a finer-grained way and many applications take subtrajectories as basic units for analysis. In this paper, we study the SimSub problem and develop a suite of algorithms including both exact and approximate ones. Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency. We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.
LGDec 17, 2018
Representation Learning for Spatial GraphsZheng Wang, Ce Ju, Gao Cong et al.
Recently, the topic of graph representation learning has received plenty of attention. Existing approaches usually focus on structural properties only and thus they are not sufficient for those spatial graphs where the nodes are associated with some spatial information. In this paper, we present the first deep learning approach called s2vec for learning spatial graph representations, which is based on denoising autoencoders framework (DAF). We evaluate the learned representations on real datasets and the results verified the effectiveness of s2vec when used for spatial clustering.
IRSep 5, 2018
HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender SystemsLucas Vinh Tran, Yi Tay, Shuai Zhang et al.
This paper investigates the notion of learning user and item representations in non-Euclidean space. Specifically, we study the connection between metric learning in hyperbolic space and collaborative filtering by exploring Mobius gyrovector spaces where the formalism of the spaces could be utilized to generalize the most common Euclidean vector operations. Overall, this work aims to bridge the gap between Euclidean and hyperbolic geometry in recommender systems through metric learning approach. We propose HyperML (Hyperbolic Metric Learning), a conceptually simple but highly effective model for boosting the performance. Via a series of extensive experiments, we show that our proposed HyperML not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in hyperbolic geometry.
AIApr 12, 2018
Interact and Decide: Medley of Sub-Attention Networks for Effective Group RecommendationLucas Vinh Tran, Tuan-Anh Nguyen Pham, Yi Tay et al.
This paper proposes Medley of Sub-Attention Networks (MoSAN), a new novel neural architecture for the group recommendation task. Group-level recommendation is known to be a challenging task, in which intricate group dynamics have to be considered. As such, this is to be contrasted with the standard recommendation problem where recommendations are personalized with respect to a single user. Our proposed approach hinges upon the key intuition that the decision making process (in groups) is generally dynamic, i.e., a user's decision is highly dependent on the other group members. All in all, our key motivation manifests in a form of an attentive neural model that captures fine-grained interactions between group members. In our MoSAN model, each sub-attention module is representative of a single member, which models a user's preference with respect to all other group members. Subsequently, a Medley of Sub-Attention modules is then used to collectively make the group's final decision. Overall, our proposed model is both expressive and effective. Via a series of extensive experiments, we show that MoSAN not only achieves state-of-the-art performance but also improves standard baselines by a considerable margin.
LGFeb 19, 2018
Heron Inference for Bayesian Graphical ModelsDaniel Rugeles, Zhen Hai, Gao Cong et al.
Bayesian graphical models have been shown to be a powerful tool for discovering uncertainty and causal structure from real-world data in many application fields. Current inference methods primarily follow different kinds of trade-offs between computational complexity and predictive accuracy. At one end of the spectrum, variational inference approaches perform well in computational efficiency, while at the other end, Gibbs sampling approaches are known to be relatively accurate for prediction in practice. In this paper, we extend an existing Gibbs sampling method, and propose a new deterministic Heron inference (Heron) for a family of Bayesian graphical models. In addition to the support for nontrivial distributability, one more benefit of Heron is that it is able to not only allow us to easily assess the convergence status but also largely improve the running efficiency. We evaluate Heron against the standard collapsed Gibbs sampler and state-of-the-art state augmentation method in inference for well-known graphical models. Experimental results using publicly available real-life data have demonstrated that Heron significantly outperforms the baseline methods for inferring Bayesian graphical models.