Ruitong Zhang

LG
h-index39
7papers
400citations
Novelty36%
AI Score48

7 Papers

LGJun 21, 2022Code
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

Kay Liu, Yingtong Dou, Yue Zhao et al.

Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present--to the best of our knowledge--the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to "organic" (real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https://github.com/pygod-team/pygod/tree/main/benchmark

LGApr 26, 2022Code
PyGOD: A Python Library for Graph Outlier Detection

Kay Liu, Yingtong Dou, Xueying Ding et al.

PyGOD is an open-source Python library for detecting outliers in graph data. As the first comprehensive library of its kind, PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use, well-documented API designed for use by both researchers and practitioners. PyGOD provides modularized components of the different detectors implemented so that users can easily customize each detector for their purposes. To ease the construction of detection workflows, PyGOD offers numerous commonly used utility functions. To scale computation to large graphs, PyGOD supports functionalities for deep models such as sampling and mini-batch processing. PyGOD uses best practices in fostering code reliability and maintainability, including unit testing, continuous integration, and code coverage. To facilitate accessibility, PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI).

LGAug 9, 2022Code
Automating DBSCAN via Deep Reinforcement Learning

Ruitong Zhang, Hao Peng, Yingtong Dou et al.

DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at https://github.com/RingBDStack/DRL-DBSCAN.

MLMay 13
Adaptive Kernel Density Estimation with Pre-training

Ruitong Zhang, Ke Deng

Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.

LGOct 15, 2025
STAR: Boosting Time Series Foundation Models for Anomaly Detection through State-aware Adapter

Hanyin Cheng, Ruitong Zhang, Yuning Lu et al.

While Time Series Foundation Models (TSFMs) have demonstrated remarkable success in Multivariate Time Series Anomaly Detection (MTSAD), however, in real-world industrial scenarios, many time series comprise not only numerical variables such as temperature and flow, but also numerous discrete state variables that describe the system status, such as valve on/off or day of the week. Existing TSFMs often overlook the distinct categorical nature of state variables and their critical role as conditions, typically treating them uniformly with numerical variables. This inappropriate modeling approach prevents the model from fully leveraging state information and even leads to a significant degradation in detection performance after state variables are integrated. To address this critical limitation, this paper proposes a novel STate-aware AdapteR (STAR). STAR is a plug-and-play module designed to enhance the capability of TSFMs in modeling and leveraging state variables during the fine-tuning stage. Specifically, STAR comprisesthree core components: (1) We design an Identity-guided State Encoder, whicheffectively captures the complex categorical semantics of state variables through a learnable State Memory. (2) We propose a Conditional Bottleneck Adapter, which dynamically generates low-rank adaptation parameters conditioned on the current state, thereby flexibly injecting the influence of state variables into the backbone model. (3) We also introduce a Numeral-State Matching module to more effectively detect anomalies inherent to the state variables themselves. Extensive experiments conducted on real-world datasets demonstrate that STAR can improve the performance of existing TSFMs on MTSAD.

LGMay 7, 2025
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning

Hao Peng, Xiang Huang, Shuo Sun et al.

DBSCAN, a well-known density-based clustering algorithm, has gained widespread popularity and usage due to its effectiveness in identifying clusters of arbitrary shapes and handling noisy data. However, it encounters challenges in producing satisfactory cluster results when confronted with datasets of varying density scales, a common scenario in real-world applications. In this paper, we propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN. First, we model the initial dataset as a two-level encoding tree and categorize the data vertices into distinct density partitions according to the information uncertainty determined in the encoding tree. Each partition is then assigned to an agent to find the best clustering parameters without manual assistance. The allocation is density-adaptive, enabling AR-DBSCAN to effectively handle diverse density distributions within the dataset by utilizing distinct agents for different partitions. Second, a multi-agent deep reinforcement learning guided automatic parameter searching process is designed. The process of adjusting the parameter search direction by perceiving the clustering environment is modeled as a Markov decision process. Using a weakly-supervised reward training policy network, each agent adaptively learns the optimal clustering parameters by interacting with the clusters. Third, a recursive search mechanism adaptable to the data's scale is presented, enabling efficient and controlled exploration of large parameter spaces. Extensive experiments are conducted on nine artificial datasets and a real-world dataset. The results of offline and online tasks show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.

LGApr 16, 2021
Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks

Hao Peng, Ruitong Zhang, Yingtong Dou et al.

Graph Neural Networks (GNNs) have been widely used for the representation learning of various structured graph data. While promising, most existing GNNs oversimplified the complexity and diversity of the edges in the graph, and thus inefficient to cope with ubiquitous heterogeneous graphs, which are typically in the form of multi-relational graph representations. In this paper, we propose RioGNN, a novel Reinforced, recursive and flexible neighborhood selection guided multi-relational Graph Neural Network architecture, to navigate complexity of neural network structures whilst maintaining relation-dependent representations. We first construct a multi-relational graph, according to the practical task, to reflect the heterogeneity of nodes, edges, attributes and labels. To avoid the embedding over-assimilation among different types of nodes, we employ a label-aware neural similarity measure to ascertain the most similar neighbors based on node attributes. A reinforced relation-aware neighbor selection mechanism is developed to choose the most similar neighbors of a targeting node within a relation before aggregating all neighborhood information from different relations to obtain the eventual node embedding. Particularly, to improve the efficiency of neighbor selecting, we propose a new recursive and scalable reinforcement learning framework with estimable depth and width for different scales of multi-relational graphs. RioGNN can learn more discriminative node embedding with enhanced explainability due to the recognition of individual importance of each relation via the filtering threshold mechanism. Comprehensive experiments on real-world graph data and practical tasks demonstrate the advancements of effectiveness, efficiency and the model explainability, as opposed to other comparative GNN models.