Leilei Sun

LG
h-index32
35papers
1,952citations
Novelty55%
AI Score60

35 Papers

LGMar 23, 2023Code
Towards Better Dynamic Graph Learning: New Architecture and Unified Library

Le Yu, Leilei Sun, Bowen Du et al.

We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning. DyGFormer is conceptually simple and only needs to learn from nodes' historical first-hop interactions by: (1) a neighbor co-occurrence encoding scheme that explores the correlations of the source node and destination node based on their historical sequences; (2) a patching technique that divides each sequence into multiple patches and feeds them to Transformer, allowing the model to effectively and efficiently benefit from longer histories. We also introduce DyGLib, a unified library with standard training pipelines, extensible coding interfaces, and comprehensive evaluating protocols to promote reproducible, scalable, and credible dynamic graph learning research. By performing exhaustive experiments on thirteen datasets for dynamic link prediction and dynamic node classification tasks, we find that DyGFormer achieves state-of-the-art performance on most of the datasets, demonstrating its effectiveness in capturing nodes' correlations and long-term temporal dependencies. Moreover, some results of baselines are inconsistent with previous reports, which may be caused by their diverse but less rigorous implementations, showing the importance of DyGLib. All the used resources are publicly available at https://github.com/yule-BUAA/DyGLib.

LGFeb 20, 2023Code
PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation

Mingzhe Liu, Han Huang, Hao Feng et al.

Spatiotemporal data mining plays an important role in air quality monitoring, crowd flow modeling, and climate forecasting. However, the originally collected spatiotemporal data in real-world scenarios is usually incomplete due to sensor failures or transmission loss. Spatiotemporal imputation aims to fill the missing values according to the observed values and the underlying spatiotemporal dependence of them. The previous dominant models impute missing values autoregressively and suffer from the problem of error accumulation. As emerging powerful generative models, the diffusion probabilistic models can be adopted to impute missing values conditioned by observations and avoid inferring missing values from inaccurate historical imputation. However, the construction and utilization of conditional information are inevitable challenges when applying diffusion models to spatiotemporal imputation. To address above issues, we propose a conditional diffusion framework for spatiotemporal imputation with enhanced prior modeling, named PriSTI. Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior. Then, a noise estimation module transforms random noise to realistic values, with the spatiotemporal attention weights calculated by the conditional feature, as well as the consideration of geographic relationships. PriSTI outperforms existing imputation methods in various missing patterns of different real-world spatiotemporal data, and effectively handles scenarios such as high missing rates and sensor failure. The implementation code is available at https://github.com/LMZZML/PriSTI.

LGJan 1, 2023Code
Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

Han Huang, Leilei Sun, Bowen Du et al.

Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps. Our code is provided in https://github.com/GRAPH-0/CDGS.

LGJul 30, 2024Code
DyGKT: Dynamic Graph Learning for Knowledge Tracing

Ke Cheng, Linzhi Peng, Pengyang Wang et al.

Knowledge Tracing aims to assess student learning states by predicting their performance in answering questions. Different from the existing research which utilizes fixed-length learning sequence to obtain the student states and regards KT as a static problem, this work is motivated by three dynamical characteristics: 1) The scales of students answering records are constantly growing; 2) The semantics of time intervals between the records vary; 3) The relationships between students, questions and concepts are evolving. The three dynamical characteristics above contain the great potential to revolutionize the existing knowledge tracing methods. Along this line, we propose a Dynamic Graph-based Knowledge Tracing model, namely DyGKT. In particular, a continuous-time dynamic question-answering graph for knowledge tracing is constructed to deal with the infinitely growing answering behaviors, and it is worth mentioning that it is the first time dynamic graph learning technology is used in this field. Then, a dual time encoder is proposed to capture long-term and short-term semantics among the different time intervals. Finally, a multiset indicator is utilized to model the evolving relationships between students, questions, and concepts via the graph structural feature. Numerous experiments are conducted on five real-world datasets, and the results demonstrate the superiority of our model. All the used resources are publicly available at https://github.com/PengLinzhi/DyGKT.

CLOct 19, 2023Code
Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Tao Zou, Le Yu, Yifei Huang et al.

In many real-world scenarios (e.g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs). Current pretraining tasks for Language Models (LMs) primarily focus on separately learning the textual information of each entity and overlook the crucial aspect of capturing topological connections among entities in TAHGs. In this paper, we present a new pretraining framework for LMs that explicitly considers the topological and heterogeneous information in TAHGs. Firstly, we define a context graph as neighborhoods of a target node within specific orders and propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network. Secondly, based on the observation that some nodes are text-rich while others have little text, we devise a text augmentation strategy to enrich textless nodes with their neighbors' texts for handling the imbalance issue. We conduct link prediction and node classification tasks on three datasets from various domains. Experimental results demonstrate the superiority of our approach over existing methods and the rationality of each design. Our code is available at https://github.com/Hope-Rita/THLM.

LGSep 8, 2023Code
Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing

Wangyang Ying, Dongjie Wang, Kunpeng Liu et al.

Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.

LGJun 28, 2022
Learning the Evolutionary and Multi-scale Graph Structure for Multivariate Time Series Forecasting

Junchen Ye, Zihan Liu, Bowen Du et al.

Recent studies have shown great promise in applying graph neural networks for multivariate time series forecasting, where the interactions of time series are described as a graph structure and the variables are represented as the graph nodes. Along this line, existing methods usually assume that the graph structure (or the adjacency matrix), which determines the aggregation manner of graph neural network, is fixed either by definition or self-learning. However, the interactions of variables can be dynamic and evolutionary in real-world scenarios. Furthermore, the interactions of time series are quite different if they are observed at different time scales. To equip the graph neural network with a flexible and practical graph structure, in this paper, we investigate how to model the evolutionary and multi-scale interactions of time series. In particular, we first provide a hierarchical graph structure cooperated with the dilated convolution to capture the scale-specific correlations among time series. Then, a series of adjacency matrices are constructed under a recurrent manner to represent the evolving correlations at each layer. Moreover, a unified neural network is provided to integrate the components above to get the final prediction. In this way, we can capture the pair-wise correlations and temporal dependency simultaneously. Finally, experiments on both single-step and multi-step forecasting tasks demonstrate the superiority of our method over the state-of-the-art approaches.

LGMay 29, 2022Code
Dynamic Graph Learning Based on Hierarchical Memory for Origin-Destination Demand Prediction

Ruixing Zhang, Liangzhe Han, Boyi Liu et al.

Recent years have witnessed a rapid growth of applying deep spatiotemporal methods in traffic forecasting. However, the prediction of origin-destination (OD) demands is still a challenging problem since the number of OD pairs is usually quadratic to the number of stations. In this case, most of the existing spatiotemporal methods fail to handle spatial relations on such a large scale. To address this problem, this paper provides a dynamic graph representation learning framework for OD demands prediction. In particular, a hierarchical memory updater is first proposed to maintain a time-aware representation for each node, and the representations are updated according to the most recently observed OD trips in continuous-time and multiple discrete-time ways. Second, a spatiotemporal propagation mechanism is provided to aggregate representations of neighbor nodes along a random spatiotemporal route which treats origin and destination as two different semantic entities. Last, an objective function is designed to derive the future OD demands according to the most recent node representations, and also to tackle the data sparsity problem in OD prediction. Extensive experiments have been conducted on two real-world datasets, and the experimental results demonstrate the superiority of the proposed method. The code and data are available at https://github.com/Rising0321/HMOD.

LGDec 4, 2022
GraphGDP: Generative Diffusion Processes for Permutation Invariant Graph Generation

Han Huang, Leilei Sun, Bowen Du et al.

Graph generative models have broad applications in biology, chemistry and social science. However, modelling and understanding the generative process of graphs is challenging due to the discrete and high-dimensional nature of graphs, as well as permutation invariance to node orderings in underlying graph distributions. Current leading autoregressive models fail to capture the permutation invariance nature of graphs for the reliance on generation ordering and have high time complexity. Here, we propose a continuous-time generative diffusion process for permutation invariant graph generation to mitigate these issues. Specifically, we first construct a forward diffusion process defined by a stochastic differential equation (SDE), which smoothly converts graphs within the complex distribution to random graphs that follow a known edge probability. Solving the corresponding reverse-time SDE, graphs can be generated from newly sampled random graphs. To facilitate the reverse-time SDE, we newly design a position-enhanced graph score network, capturing the evolving structure and position information from perturbed graphs for permutation equivariant score estimation. Under the evaluation of comprehensive metrics, our proposed generative diffusion process achieves competitive performance in graph distribution learning. Experimental results also show that GraphGDP can generate high-quality graphs in only 24 function evaluations, much faster than previous autoregressive models.

LGJun 30, 2022
Continuous-Time and Multi-Level Graph Representation Learning for Origin-Destination Demand Prediction

Liangzhe Han, Xiaojian Ma, Leilei Sun et al.

Traffic demand forecasting by deep neural networks has attracted widespread interest in both academia and industry society. Among them, the pairwise Origin-Destination (OD) demand prediction is a valuable but challenging problem due to several factors: (i) the large number of possible OD pairs, (ii) implicitness of spatial dependence, and (iii) complexity of traffic states. To address the above issues, this paper proposes a Continuous-time and Multi-level dynamic graph representation learning method for Origin-Destination demand prediction (CMOD). Firstly, a continuous-time dynamic graph representation learning framework is constructed, which maintains a dynamic state vector for each traffic node (metro stations or taxi zones). The state vectors keep historical transaction information and are continuously updated according to the most recently happened transactions. Secondly, a multi-level structure learning module is proposed to model the spatial dependency of station-level nodes. It can not only exploit relations between nodes adaptively from data, but also share messages and representations via cluster-level and area-level virtual nodes. Lastly, a cross-level fusion module is designed to integrate multi-level memories and generate comprehensive node representations for the final prediction. Extensive experiments are conducted on two real-world datasets from Beijing Subway and New York Taxi, and the results demonstrate the superiority of our model against the state-of-the-art approaches.

LGAug 24, 2024Code
RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming

Hao Wang, Jindong Han, Wei Fan et al.

Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a semantic-oriented PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a semantic-oriented decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting. Our codes can be found at https://github.com/usail-hkust/REPST.

AIDec 1, 2022
Human-instructed Deep Hierarchical Generative Learning for Automated Urban Planning

Dongjie Wang, Lingfei Wu, Denghui Zhang et al.

The essential task of urban planning is to generate the optimal land-use configuration of a target area. However, traditional urban planning is time-consuming and labor-intensive. Deep generative learning gives us hope that we can automate this planning process and come up with the ideal urban plans. While remarkable achievements have been obtained, they have exhibited limitations in lacking awareness of: 1) the hierarchical dependencies between functional zones and spatial grids; 2) the peer dependencies among functional zones; and 3) human regulations to ensure the usability of generated configurations. To address these limitations, we develop a novel human-instructed deep hierarchical generative model. We rethink the urban planning generative task from a unique functionality perspective, where we summarize planning requirements into different functionality projections for better urban plan generation. To this end, we develop a three-stage generation process from a target area to zones to grids. The first stage is to label the grids of a target area with latent functionalities to discover functional zones. The second stage is to perceive the planning requirements to form urban functionality projections. We propose a novel module: functionalizer to project the embedding of human instructions and geospatial contexts to the zone-level plan to obtain such projections. Each projection includes the information of land-use portfolios and the structural dependencies across spatial grids in terms of a specific urban function. The third stage is to leverage multi-attentions to model the zone-zone peer dependencies of the functionality projections to generate grid-level land-use configurations. Finally, we present extensive experiments to demonstrate the effectiveness of our framework.

LGMay 31, 2022
Label-Enhanced Graph Neural Network for Semi-supervised Node Classification

Le Yu, Leilei Sun, Bowen Du et al.

Graph Neural Networks (GNNs) have been widely applied in the semi-supervised node classification task, where a key point lies in how to sufficiently leverage the limited but valuable label information. Most of the classical GNNs solely use the known labels for computing the classification loss at the output. In recent years, several methods have been designed to additionally utilize the labels at the input. One part of the methods augment the node features via concatenating or adding them with the one-hot encodings of labels, while other methods optimize the graph structure by assuming neighboring nodes tend to have the same label. To bring into full play the rich information of labels, in this paper, we present a label-enhanced learning framework for GNNs, which first models each label as a virtual center for intra-class nodes and then jointly learns the representations of both nodes and labels. Our approach could not only smooth the representations of nodes belonging to the same class, but also explicitly encode the label semantics into the learning process of GNNs. Moreover, a training node selection technique is provided to eliminate the potential label leakage issue and guarantee the model generalization ability. Finally, an adaptive self-training strategy is proposed to iteratively enlarge the training set with more reliable pseudo labels and distinguish the importance of each pseudo-labeled node during the model training process. Experimental results on both real-world and synthetic datasets demonstrate our approach can not only consistently outperform the state-of-the-arts, but also effectively smooth the representations of intra-class nodes.

AISep 26, 2022
Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Dongjie Wang, Kunpeng Liu, Yanyong Huang et al.

Traditional urban planning demands urban experts to spend considerable time and effort producing an optimal urban plan under many architectural constraints. The remarkable imaginative ability of deep generative learning provides hope for renovating urban planning. While automated urban planners have been examined, they are constrained because of the following: 1) neglecting human requirements in urban planning; 2) omitting spatial hierarchies in urban planning, and 3) lacking numerous urban plan data samples. To overcome these limitations, we propose a novel, deep, human-instructed urban planner. In the preliminary work, we formulate it into an encoder-decoder paradigm. The encoder is to learn the information distribution of surrounding contexts, human instructions, and land-use configuration. The decoder is to reconstruct the land-use configuration and the associated urban functional zones. The reconstruction procedure will capture the spatial hierarchies between functional zones and spatial grids. Meanwhile, we introduce a variational Gaussian mechanism to mitigate the data sparsity issue. Even though early work has led to good results, the performance of generation is still unstable because the way spatial hierarchies are captured may lead to unclear optimization directions. In this journal version, we propose a cascading deep generative framework based on generative adversarial networks (GANs) to solve this problem, inspired by the workflow of urban experts. In particular, the purpose of the first GAN is to build urban functional zones based on information from human instructions and surrounding contexts. The second GAN will produce the land-use configuration based on the functional zones that have been constructed. Additionally, we provide a conditioning augmentation module to augment data samples. Finally, we conduct extensive experiments to validate the efficacy of our work.

LGApr 12, 2022
Continuous-Time User Preference Modelling for Temporal Sets Prediction

Le Yu, Zihang Liu, Leilei Sun et al.

Given a sequence of sets, where each set has a timestamp and contains an arbitrary number of elements, temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly focus on the modelling of elements and implicitly represent each user's preference based on his/her interacted elements. However, user preferences are often continuously evolving and the evolutionary trend cannot be fully captured with the indirect learning paradigm of user preferences. To this end, we propose a continuous-time user preference modelling framework for temporal sets prediction, which explicitly models the evolving preference of each user by maintaining a memory bank to store the states of all the users and elements. Specifically, we first construct a universal sequence by arranging all the user-set interactions in a non-descending temporal order, and then chronologically learn from each user-set interaction. For each interaction, we continuously update the memories of the related user and elements based on their currently encoded messages and past memories. Moreover, we present a personalized user behavior learning module to discover user-specific characteristics based on each user's historical sequence, which aggregates the previously interacted elements from dual perspectives according to the user and elements. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x and 3.0x speedups in the training and evaluation process on average. Experiments on four real-world datasets demonstrate the superiority of our approach over state-of-the-arts under both transductive and inductive settings. The good interpretability of our method is also shown.

LGJul 30, 2024
Co-Neighbor Encoding Schema: A Light-cost Structure Encoding Method for Dynamic Link Prediction

Ke Cheng, Linzhi Peng, Junchen Ye et al.

Structure encoding has proven to be the key feature to distinguishing links in a graph. However, Structure encoding in the temporal graph keeps changing as the graph evolves, repeatedly computing such features can be time-consuming due to the high-order subgraph construction. We develop the Co-Neighbor Encoding Schema (CNES) to address this issue. Instead of recomputing the feature by the link, CNES stores information in the memory to avoid redundant calculations. Besides, unlike the existing memory-based dynamic graph learning method that stores node hidden states, we introduce a hashtable-based memory to compress the adjacency matrix for efficient structure feature construction and updating with vector computation in parallel. Furthermore, CNES introduces a Temporal-Diverse Memory to generate long-term and short-term structure encoding for neighbors with different structural information. A dynamic graph learning framework, Co-Neighbor Encoding Network (CNE-N), is proposed using the aforementioned techniques. Extensive experiments on thirteen public datasets verify the effectiveness and efficiency of the proposed method.

63.1CVMar 27Code
Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling

Ruixing Zhang, Hanzhang Jiang, Leilei Sun et al.

Mobile devices continuously interact with cellular base stations, generating massive volumes of signaling records that provide broad coverage for understanding human mobility. However, such records offer only coarse location cues (e.g., serving-cell identifiers) and therefore limit their direct use in applications that require high-precision GPS trajectories. This paper studies the Sig2GPS problem: reconstructing GPS trajectories from cellular signaling. Inspired by domain experts often lay the signaling trace on the map and sketch the corresponding GPS route, unlike conventional solutions that rely on complex multi-stage engineering pipelines or regress coordinates, Sig2GPS is reframed as an image-to-video generation task that directly operates in the map-visual domain: signaling traces are rendered on a map, and a video generation model is trained to draw a continuous GPS path. To support this paradigm, a paired signaling-to-trajectory video dataset is constructed to fine-tune an open-source video model, and a trajectory-aware reinforcement learning-based optimization method is introduced to improve generation fidelity via rewards. Experiments on large-scale real-world datasets show substantial improvements over strong engineered and learning-based baselines, while additional results on next GPS prediction indicate scalability and cross-city transferability. Overall, these results suggest that map-visual video generation provides a practical interface for trajectory data mining by enabling direct generation and refinement of continuous paths under map constraints.

CLMay 5, 2022
Exploiting Global and Local Hierarchies for Hierarchical Text Classification

Ting Jiang, Deqing Wang, Leilei Sun et al.

Hierarchical text classification aims to leverage label hierarchy in multi-label text classification. Existing methods encode label hierarchy in a global view, where label hierarchy is treated as the static hierarchical structure containing all labels. Since global hierarchy is static and irrelevant to text samples, it makes these methods hard to exploit hierarchical information. Contrary to global hierarchy, local hierarchy as a structured labels hierarchy corresponding to each text sample. It is dynamic and relevant to text samples, which is ignored in previous methods. To exploit global and local hierarchies,we propose Hierarchy-guided BERT with Global and Local hierarchies (HBGL), which utilizes the large-scale parameters and prior language knowledge of BERT to model both global and local hierarchies.Moreover,HBGL avoids the intentional fusion of semantic and hierarchical modules by directly modeling semantic and hierarchical information with BERT.Compared with the state-of-the-art method HGCLR,our method achieves significant improvement on three benchmark datasets.

AIAug 10, 2023
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification

Tao Zou, Le Yu, Junchen Ye et al.

Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.

AIAug 4, 2023
Event-based Dynamic Graph Representation Learning for Patent Application Trend Prediction

Tao Zou, Le Yu, Leilei Sun et al.

Accurate prediction of what types of patents that companies will apply for in the next period of time can figure out their development strategies and help them discover potential partners or competitors in advance. Although important, this problem has been rarely studied in previous research due to the challenges in modelling companies' continuously evolving preferences and capturing the semantic correlations of classification codes. To fill in this gap, we propose an event-based dynamic graph learning framework for patent application trend prediction. In particular, our method is founded on the memorable representations of both companies and patent classification codes. When a new patent is observed, the representations of the related companies and classification codes are updated according to the historical memories and the currently encoded messages. Moreover, a hierarchical message passing mechanism is provided to capture the semantic proximities of patent classification codes by updating their representations along the hierarchical taxonomy. Finally, the patent application trend is predicted by aggregating the representations of the target company and classification codes from static, dynamic, and hierarchical perspectives. Experiments on real-world data demonstrate the effectiveness of our approach under various experimental conditions, and also reveal the abilities of our method in learning semantics of classification codes and tracking technology developing trajectories of companies.

LGNov 17, 2023
Regions are Who Walk Them: a Large Pre-trained Spatiotemporal Model Based on Human Mobility for Ubiquitous Urban Sensing

Ruixing Zhang, Liangzhe Han, Leilei Sun et al.

User profiling and region analysis are two tasks of significant commercial value. However, in practical applications, modeling different features typically involves four main steps: data preparation, data processing, model establishment, evaluation, and optimization. This process is time-consuming and labor-intensive. Repeating this workflow for each feature results in abundant development time for tasks and a reduced overall volume of task development. Indeed, human mobility data contains a wealth of information. Several successful cases suggest that conducting in-depth analysis of population movement data could potentially yield meaningful profiles about users and areas. Nonetheless, most related works have not thoroughly utilized the semantic information within human mobility data and trained on a fixed number of the regions. To tap into the rich information within population movement, based on the perspective that Regions Are Who walk them, we propose a large spatiotemporal model based on trajectories (RAW). It possesses the following characteristics: 1) Tailored for trajectory data, introducing a GPT-like structure with a parameter count of up to 1B; 2) Introducing a spatiotemporal fine-tuning module, interpreting trajectories as collection of users to derive arbitrary region embedding. This framework allows rapid task development based on the large spatiotemporal model. We conducted extensive experiments to validate the effectiveness of our proposed large spatiotemporal model. It's evident that our proposed method, relying solely on human mobility data without additional features, exhibits a certain level of relevance in user profiling and region analysis. Moreover, our model showcases promising predictive capabilities in trajectory generation tasks based on the current state, offering the potential for further innovative work utilizing this large spatiotemporal model.

SDOct 28, 2025Code
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

Zihan Liu, Zhikang Niu, Qiuyang Xiao et al.

Despite rapid progress in Multi-modal Large Language Models and Large Audio-Language Models, existing audio benchmarks largely test semantics that can be recovered from text captions, masking deficits in fine-grained perceptual reasoning. We formalize audio 4D intelligence that is defined as reasoning over sound dynamics in time and 3D space, and introduce STAR-Bench to measure it. STAR-Bench combines a Foundational Acoustic Perception setting (six attributes under absolute and relative regimes) with a Holistic Spatio-Temporal Reasoning setting that includes segment reordering for continuous and discrete processes and spatial tasks spanning static localization, multi-source relations, and dynamic trajectories. Our data curation pipeline uses two methods to ensure high-quality samples. For foundational tasks, we use procedurally synthesized and physics-simulated audio. For holistic data, we follow a four-stage process that includes human annotation and final selection based on human performance. Unlike prior benchmarks where caption-only answering reduces accuracy slightly, STAR-Bench induces far larger drops (-31.5\% temporal, -35.2\% spatial), evidencing its focus on linguistically hard-to-describe cues. Evaluating 19 models reveals substantial gaps compared with humans and a capability hierarchy: closed-source models are bottlenecked by fine-grained perception, while open-source models lag across perception, knowledge, and reasoning. Our STAR-Bench provides critical insights and a clear path forward for developing future models with a more robust understanding of the physical world.

LGDec 17, 2025
Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary

Xinshun Feng, Mingzhe Liu, Yi Qiao et al.

Recent advances in explainable recommendations have explored the integration of language models to analyze natural language rationales for user-item interactions. Despite their potential, existing methods often rely on ID-based representations that obscure semantic meaning and impose structural constraints on language models, thereby limiting their applicability in open-ended scenarios. These challenges are intensified by the complex nature of real-world interactions, where diverse user intents are entangled and collaborative signals rarely align with linguistic semantics. To overcome these limitations, we propose BEAT, a unified and transferable framework that tokenizes user and item behaviors into discrete, interpretable sequences. We construct a behavior vocabulary via a vector-quantized autoencoding process that disentangles macro-level interests and micro-level intentions from graph-based representations. We then introduce multi-level semantic supervision to bridge the gap between behavioral signals and language space. A semantic alignment regularization mechanism is designed to embed behavior tokens directly into the input space of frozen language models. Experiments on three public datasets show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations. Further analysis demonstrates that our behavior tokens capture fine-grained semantics and offer a plug-and-play interface for integrating complex behavior patterns into large language models.

BMMay 21, 2023Code
Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation

Han Huang, Leilei Sun, Bowen Du et al.

Designing new molecules is essential for drug discovery and material science. Recently, deep generative models that aim to model molecule distribution have made promising progress in narrowing down the chemical research space and generating high-fidelity molecules. However, current generative models only focus on modeling either 2D bonding graphs or 3D geometries, which are two complementary descriptors for molecules. The lack of ability to jointly model both limits the improvement of generation quality and further downstream applications. In this paper, we propose a new joint 2D and 3D diffusion model (JODO) that generates complete molecules with atom types, formal charges, bond information, and 3D coordinates. To capture the correlation between molecular graphs and geometries in the diffusion process, we develop a Diffusion Graph Transformer to parameterize the data prediction model that recovers the original data from noisy data. The Diffusion Graph Transformer interacts node and edge representations based on our relational attention mechanism, while simultaneously propagating and updating scalar features and geometric vectors. Our model can also be extended for inverse molecular design targeting single or multiple quantum properties. In our comprehensive evaluation pipeline for unconditional joint generation, the results of the experiment show that JODO remarkably outperforms the baselines on the QM9 and GEOM-Drugs datasets. Furthermore, our model excels in few-step fast sampling, as well as in inverse molecule design and molecular graph generation. Our code is provided in https://github.com/GRAPH-0/JODO.

CVJul 23, 2025
Eyes Will Shut: A Vision-Based Next GPS Location Prediction Model by Reinforcement Learning from Visual Map Feed Back

Ruixing Zhang, Yang Zhang, Tongyu Zhu et al.

Next Location Prediction is a fundamental task in the study of human mobility, with wide-ranging applications in transportation planning, urban governance, and epidemic forecasting. In practice, when humans attempt to predict the next location in a trajectory, they often visualize the trajectory on a map and reason based on road connectivity and movement trends. However, the vast majority of existing next-location prediction models do not reason over maps \textbf{in the way that humans do}. Fortunately, the recent development of Vision-Language Models (VLMs) has demonstrated strong capabilities in visual perception and even visual reasoning. This opens up a new possibility: by rendering both the road network and trajectory onto an image and leveraging the reasoning abilities of VLMs, we can enable models to perform trajectory inference in a human-like manner. To explore this idea, we first propose a method called Vision-Guided Location Search (VGLS), which evaluates whether a general-purpose VLM is capable of trajectory-based reasoning without modifying any of its internal parameters. Based on insights from the VGLS results, we further propose our main approach: VLMLocPredictor, which is composed of two stages: In the first stage, we design two Supervised Fine-Tuning (SFT) tasks that help the VLM understand road network and trajectory structures and acquire basic reasoning ability on such visual inputs. In the second stage, we introduce Reinforcement Learning from Visual Map Feedback, enabling the model to self-improve its next-location prediction ability through interaction with the environment. Experiments conducted on datasets from four different cities show that our method achieves state-of-the-art (SOTA) performance and exhibits superior cross-city generalization compared to other LLM-based approaches.

SIOct 26, 2025
JiuTian Chuanliu: A Large Spatiotemporal Model for General-purpose Dynamic Urban Sensing

Liangzhe Han, Leilei Sun, Tongyu Zhu et al.

As a window for urban sensing, human mobility contains rich spatiotemporal information that reflects both residents' behavior preferences and the functions of urban areas. The analysis of human mobility has attracted the attention of many researchers. However, existing methods often address specific tasks from a particular perspective, leading to insufficient modeling of human mobility and limited applicability of the learned knowledge in various downstream applications. To address these challenges, this paper proposes to push massive amounts of human mobility data into a spatiotemporal model, discover latent semantics behind mobility behavior and support various urban sensing tasks. Specifically, a large-scale and widely covering human mobility data is collected through the ubiquitous base station system and a framework named General-purpose and Dynamic Human Mobility Embedding (GDHME) for urban sensing is introduced. The framework follows the self-supervised learning idea and contains two major stages. In stage 1, GDHME treats people and regions as nodes within a dynamic graph, unifying human mobility data as people-region-time interactions. An encoder operating in continuous-time dynamically computes evolving node representations, capturing dynamic states for both people and regions. Moreover, an autoregressive self-supervised task is specially designed to guide the learning of the general-purpose node embeddings. In stage 2, these representations are utilized to support various tasks. To evaluate the effectiveness of our GDHME framework, we further construct a multi-task urban sensing benchmark. Offline experiments demonstrate GDHME's ability to automatically learn valuable node features from vast amounts of data. Furthermore, our framework is used to deploy the JiuTian ChuanLiu Big Model, a system that has been presented at the 2023 China Mobile Worldwide Partner Conference.

LGAug 5, 2025
Urban In-Context Learning: Bridging Pretraining and Inference through Masked Diffusion for Urban Profiling

Ruixing Zhang, Bo Wang, Tongyu Zhu et al.

Urban profiling aims to predict urban profiles in unknown regions and plays a critical role in economic and social censuses. Existing approaches typically follow a two-stage paradigm: first, learning representations of urban areas; second, performing downstream prediction via linear probing, which originates from the BERT era. Inspired by the development of GPT style models, recent studies have shown that novel self-supervised pretraining schemes can endow models with direct applicability to downstream tasks, thereby eliminating the need for task-specific fine-tuning. This is largely because GPT unifies the form of pretraining and inference through next-token prediction. However, urban data exhibit structural characteristics that differ fundamentally from language, making it challenging to design a one-stage model that unifies both pretraining and inference. In this work, we propose Urban In-Context Learning, a framework that unifies pretraining and inference via a masked autoencoding process over urban regions. To capture the distribution of urban profiles, we introduce the Urban Masked Diffusion Transformer, which enables each region' s prediction to be represented as a distribution rather than a deterministic value. Furthermore, to stabilize diffusion training, we propose the Urban Representation Alignment Mechanism, which regularizes the model's intermediate features by aligning them with those from classical urban profiling methods. Extensive experiments on three indicators across two cities demonstrate that our one-stage method consistently outperforms state-of-the-art two-stage approaches. Ablation studies and case studies further validate the effectiveness of each proposed module, particularly the use of diffusion modeling.

CVOct 12, 2021
Deep Human-guided Conditional Variational Generative Modeling for Automated Urban Planning

Dongjie Wang, Kunpeng Liu, Pauline Johnson et al.

Urban planning designs land-use configurations and can benefit building livable, sustainable, safe communities. Inspired by image generation, deep urban planning aims to leverage deep learning to generate land-use configurations. However, urban planning is a complex process. Existing studies usually ignore the need of personalized human guidance in planning, and spatial hierarchical structure in planning generation. Moreover, the lack of large-scale land-use configuration samples poses a data sparsity challenge. This paper studies a novel deep human guided urban planning method to jointly solve the above challenges. Specifically, we formulate the problem into a deep conditional variational autoencoder based framework. In this framework, we exploit the deep encoder-decoder design to generate land-use configurations. To capture the spatial hierarchy structure of land uses, we enforce the decoder to generate both the coarse-grained layer of functional zones, and the fine-grained layer of POI distributions. To integrate human guidance, we allow humans to describe what they need as texts and use these texts as a model condition input. To mitigate training data sparsity and improve model robustness, we introduce a variational Gaussian embedding mechanism. It not just allows us to better approximate the embedding space distribution of training data and sample a larger population to overcome sparsity, but also adds more probabilistic randomness into the urban planning generation to improve embedding diversity so as to improve robustness. Finally, we present extensive experiments to validate the enhanced performances of our method.

LGSep 27, 2021
Analysis for full face mechanical behaviors through spatial deduction model with real-time monitoring data

Xuyan Tan, Yuhang Wang, Bowen Du et al.

Mechanical analysis for the full face of tunnel structure is crucial to maintain stability, which is a challenge in classical analytical solutions and data analysis. Along this line, this study aims to develop a spatial deduction model to obtain the full-faced mechanical behaviors through integrating mechanical properties into pure data-driven model. The spatial tunnel structure is divided into many parts and reconstructed in a form of matrix. Then, the external load applied on structure in the field was considered to study the mechanical behaviors of tunnel. Based on the limited observed monitoring data in matrix and mechanical analysis results, a double-driven model was developed to obtain the full-faced information, in which the data-driven model was the dominant one and the mechanical constraint was the secondary one. To verify the presented spatial deduction model, cross-test was conducted through assuming partial monitoring data are unknown and regarding them as testing points. The well agreement between deduction results with actual monitoring results means the proposed model is reasonable. Therefore, it was employed to deduct both the current and historical performance of tunnel full face, which is crucial to prevent structural disasters.

LGMay 24, 2021
Heterogeneous Graph Representation Learning with Relation Awareness

Le Yu, Leilei Sun, Bowen Du et al.

Representation learning on heterogeneous graphs aims to obtain meaningful node representations to facilitate various downstream tasks, such as node classification and link prediction. Existing heterogeneous graph learning methods are primarily developed by following the propagation mechanism of node representations. There are few efforts on studying the role of relations for improving the learning of more fine-grained node representations. Indeed, it is important to collaboratively learn the semantic representations of relations and discern node representations with respect to different relation types. To this end, in this paper, we propose a novel Relation-aware Heterogeneous Graph Neural Network, namely R-HGNN, to learn node representations on heterogeneous graphs at a fine-grained level by considering relation-aware characteristics. Specifically, a dedicated graph convolution component is first designed to learn unique node representations from each relation-specific graph separately. Then, a cross-relation message passing module is developed to improve the interactions of node representations across different relations. Also, the relation representations are learned in a layer-wise manner to capture relation semantics, which are used to guide the node representation learning process. Moreover, a semantic fusing module is presented to aggregate relation-aware node representations into a compact representation with the learned relation representations. Finally, we conduct extensive experiments on a variety of graph learning tasks, and experimental results demonstrate that our approach consistently outperforms existing methods among all the tasks.

CLJan 9, 2021
LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Ting Jiang, Deqing Wang, Leilei Sun et al.

Extreme Multi-label text Classification (XMC) is a task of finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., AttentionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which reduces both the efficiency and accuracy of the model. To address the above problems, we proposed LightXML, which adopts end-to-end training and dynamic negative labels sampling. In LightXML, we use generative cooperative networks to recall and rank labels, in which label recalling part generates negative and positive labels, and label ranking part distinguishes positive labels from these labels. Through these networks, negative labels are sampled dynamically during label ranking part training by feeding with the same text representation. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML.

LGDec 29, 2020
Hybrid Micro/Macro Level Convolution for Heterogeneous Graph Learning

Le Yu, Leilei Sun, Bowen Du et al.

Heterogeneous graphs are pervasive in practical scenarios, where each graph consists of multiple types of nodes and edges. Representation learning on heterogeneous graphs aims to obtain low-dimensional node representations that could preserve both node attributes and relation information. However, most of the existing graph convolution approaches were designed for homogeneous graphs, and therefore cannot handle heterogeneous graphs. Some recent methods designed for heterogeneous graphs are also faced with several issues, including the insufficient utilization of heterogeneous properties, structural information loss, and lack of interpretability. In this paper, we propose HGConv, a novel Heterogeneous Graph Convolution approach, to learn comprehensive node representations on heterogeneous graphs with a hybrid micro/macro level convolutional operation. Different from existing methods, HGConv could perform convolutions on the intrinsic structure of heterogeneous graphs directly at both micro and macro levels: A micro-level convolution to learn the importance of nodes within the same relation, and a macro-level convolution to distinguish the subtle difference across different relations. The hybrid strategy enables HGConv to fully leverage heterogeneous information with proper interpretability. Moreover, a weighted residual connection is designed to aggregate both inherent attributes and neighbor information of the focal node adaptively. Extensive experiments on various tasks demonstrate not only the superiority of HGConv over existing methods, but also the intuitive interpretability of our approach for graph analysis.

LGDec 15, 2020
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction

Junchen Ye, Leilei Sun, Bowen Du et al.

Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.

CRAug 26, 2020
Defending Water Treatment Networks: Exploiting Spatio-temporal Effects for Cyber Attack Detection

Dongjie Wang, Pengyang Wang, Jingbo Zhou et al.

While Water Treatment Networks (WTNs) are critical infrastructures for local communities and public health, WTNs are vulnerable to cyber attacks. Effective detection of attacks can defend WTNs against discharging contaminated water, denying access, destroying equipment, and causing public fear. While there are extensive studies in WTNs attack detection, they only exploit the data characteristics partially to detect cyber attacks. After preliminary exploring the sensing data of WTNs, we find that integrating spatio-temporal knowledge, representation learning, and detection algorithms can improve attack detection accuracy. To this end, we propose a structured anomaly detection framework to defend WTNs by modeling the spatio-temporal characteristics of cyber attacks in WTNs. In particular, we propose a spatio-temporal representation framework specially tailored to cyber attacks after separating the sensing data of WTNs into a sequence of time segments. This framework has two key components. The first component is a temporal embedding module to preserve temporal patterns within a time segment by projecting the time segment of a sensor into a temporal embedding vector. We then construct Spatio-Temporal Graphs (STGs), where a node is a sensor and an attribute is the temporal embedding vector of the sensor, to describe the state of the WTNs. The second component is a spatial embedding module, which learns the final fused embedding of the WTNs from STGs. In addition, we devise an improved one class-SVM model that utilizes a new designed pairwise kernel to detect cyber attacks. The devised pairwise kernel augments the distance between normal and attack patterns in the fused embedding space. Finally, we conducted extensive experimental evaluations with real-world data to demonstrate the effectiveness of our framework.

LGJun 20, 2020
Predicting Temporal Sets with Deep Neural Networks

Le Yu, Leilei Sun, Bowen Du et al.

Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set. In practice, temporal sets prediction is much more complex than predictive modelling of temporal events and time series, and is still an open problem. Many possible existing methods, if adapted for the problem of temporal sets prediction, usually follow a two-step strategy by first projecting temporal sets into latent representations and then learning a predictive model with the latent representations. The two-step approach often leads to information loss and unsatisfactory prediction performance. In this paper, we propose an integrated solution based on the deep neural networks for temporal sets prediction. A unique perspective of our approach is to learn element relationship by constructing set-level co-occurrence graph and then perform graph convolutions on the dynamic relationship graphs. Moreover, we design an attention-based module to adaptively learn the temporal dependency of elements and sets. Finally, we provide a gated updating mechanism to find the hidden shared patterns in different sequences and fuse both static and dynamic information to improve the prediction performance. Experiments on real-world data sets demonstrate that our approach can achieve competitive performances even with a portion of the training data and can outperform existing methods with a significant margin.