Yongyu Wang

LG
h-index1
16papers
144citations
Novelty52%
AI Score48

16 Papers

DSDec 21, 2018
Nearly-Linear Time Spectral Graph Reduction for Scalable Graph Partitioning and Data Visualization

Zhiqiang Zhao, Yongyu Wang, Zhuo Feng

This paper proposes a scalable algorithmic framework for spectral reduction of large undirected graphs. The proposed method allows computing much smaller graphs while preserving the key spectral (structural) properties of the original graph. Our framework is built upon the following two key components: a spectrum-preserving node aggregation (reduction) scheme, as well as a spectral graph sparsification framework with iterative edge weight scaling. We show that the resulting spectrally-reduced graphs can robustly preserve the first few nontrivial eigenvalues and eigenvectors of the original graph Laplacian. In addition, the spectral graph reduction method has been leveraged to develop much faster algorithms for multilevel spectral graph partitioning as well as t-distributed Stochastic Neighbor Embedding (t-SNE) of large data sets. We conducted extensive experiments using a variety of large graphs and data sets, and obtained very promising results. For instance, we are able to reduce the "coPapersCiteseer" graph with 0.43 million nodes and 16 million edges to a much smaller graph with only 13K (32X fewer) nodes and 17K (950X fewer) edges in about 16 seconds; the spectrally-reduced graphs also allow us to achieve up to 1100X speedup for spectral graph partitioning and up to 60X speedup for t-SNE visualization of large data sets.

LGApr 19, 2023
Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

Yuxuan Song, Yongyu Wang

This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for support vector clustering. We conducted extensive experiments using real-world data sets and obtained very promising results. The proposed method allows us to achieve 100X and 115X speedups over the state of the art SVC method on the Pendigits and USPS data sets, respectively, while achieving even better clustering quality. To the best of our knowledge, this represents the first practical method for high-quality and fast SVC on large-scale real-world data sets

LGJun 7, 2023
Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Yuxuan Song, Yongyu Wang

Exploratory data analysis (EDA) is a vital procedure for data science projects. In this work, we introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA. By exploiting the SEPs to be the representative points, our approach aims to generate high-quality clustering and data visualization for large-scale data sets. A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets. Compared with prior state-of-the-art clustering and data visualization methods, the proposed methods allow substantially improving computing efficiency and solution quality for large-scale data analysis tasks.

LGMay 15
T2T-LA: A Topology-to-Topology LLM Agent for Graph Learning with Neither Feature Access nor Task Knowledge

Yongyu Wang

Graph learning aims to convert data into graph representations, which are fundamental to many problems in machine learning for CAD, where circuits, layouts, designs, and optimization states are often modeled as graph-structured objects. Existing graph learning methods usually rely on carefully designed graph construction rules, extensive parameter tuning, and sophisticated mathematical theory; moreover, achieving good performance often requires task-specific graph construction tailored to the downstream objective. In this work, we study whether a large language model (LLM) can reason about graph structure and infer a useful topology without observing the feature matrix, without knowing the downstream task, and without relying on any carefully designed graph construction algorithm or parameter tuning process. To this end, we propose T2T-LA, a Topology-to-Topology LLM Agent that receives no input other than a set of previously failed topologies and the scores assigned to them by a private scorer. The agent is not told what task or algorithm produces the scores, how these topologies are generated, or what the scores mean. Since none of the observed topologies is satisfactory, T2T-LA cannot simply imitate a good example. Instead, it is forced to infer hidden relationships between graph connectivity patterns and the observed scores, a capability that is particularly relevant to CAD scenarios where useful design structures may be difficult to specify manually. Experimental results show that T2T-LA can generate, in one shot, a graph topology that enables the downstream algorithm to produce a sufficiently good solution, suggesting a new LLM-driven direction for topology reasoning and graph representation learning in ML-for-CAD workflows.

LGMay 14
From Llama to Cria: Scaling Down Neural Networks via Neuron-Level Spectral Structural Importance Evaluation

Yongyu Wang

This paper proposes a neuron pruning framework based on neuron-level spectral structural importance evaluation. Given a trained neural network, we record the hidden states of each hidden layer during inference and model neurons as graph nodes, with hidden states treated as graph signals. Using ideas from graph signal processing, we infer layer-wise input and output graphs that characterize the structural relationships among neurons before and after each layer transformation. We then evaluate the spectral structural importance of neurons by analyzing the transformation between these graphs based on spectral graph theory. Neurons with high spectral structural importance are regarded as strongly involved in the internal representation transformation and are therefore preserved, while neurons with low importance scores are selected as pruning candidates. The pruning process is conducted iteratively until a predefined effective parameter reduction target is reached. Instead of fine-tuning after every pruning step, the proposed strategy first removes low-importance neurons to obtain a compact architecture and then applies a final recovery fine-tuning stage to restore task performance. By connecting neuron pruning with graph signal processing and spectral structural analysis, the proposed framework offers a principled way to reduce neural network size while maintaining solution quality. Experimental results on CIFAR-10 image classification and SST-2 sentiment classification show that our method can effectively remove low-importance neurons and achieve compact networks with competitive performance after recovery fine-tuning.

LGMay 5
Improving Graph Neural Network Training, Defense, Spectral Hypergraph Clustering and Multiview Spectral Clustering via Adversarial Robustness Evaluation

Yongyu Wang

Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph-structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise and adversarial attacks from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. Similarly, a hypergraph is a highly complex structure, and partitioning a hypergraph is a challenging task. This paper leverages spectral adversarial robustness evaluation to effectively address key challenges in complex-graph algorithms. By using spectral adversarial robustness evaluation to distinguish robust nodes from non-robust ones and treating them differently, we propose a training-set construction strategy that improves the training quality of GNNs. In addition, we develop algorithms to enhance both the adversarial robustness of GNNs and the performance of hypergraph clustering. Experimental results show that this series of methods is highly effective.

CVNov 18, 2024
Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Yongyu Wang

DBSCAN is one of the most important non-parametric unsupervised data analysis tools. By applying DBSCAN to a dataset, two key analytical results can be obtained: (1) clustering data points based on density distribution and (2) identifying outliers in the dataset. However, the time complexity of the DBSCAN algorithm is $O(n^2 β)$, where $n$ is the number of data points and $β= O(D)$, with $D$ representing the dimensionality of the data space. As a result, DBSCAN becomes computationally infeasible when both $n$ and $D$ are large. In this paper, we propose a DBSCAN method based on spectral data compression, capable of efficiently processing datasets with a large number of data points ($n$) and high dimensionality ($D$). By preserving only the most critical structural information during the compression process, our method effectively removes substantial redundancy and noise. Consequently, the solution quality of DBSCAN is significantly improved, enabling more accurate and reliable results.

LGDec 19, 2024
Training Graph Neural Networks Using Non-Robust Samples

Yongyu Wang

Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph -- structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the original training set to construct a smaller yet more effective training set for model training. These samples are used to help improve the model's ability to correctly process data in noisy environments. We have evaluated our approach on three of the most classical GNN models -- GCN, GAT, and GraphSAGE -- as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the training of Graph Neural Networks compared to using randomly sampled training sets of the same size from the original training set and the larger original full training set.

LGDec 14, 2024
Improving Graph Neural Networks via Adversarial Robustness Evaluation

Yongyu Wang

Graph Neural Networks (GNNs) are currently one of the most powerful types of neural network architectures. Their advantage lies in the ability to leverage both the graph topology, which represents the relationships between samples, and the features of the samples themselves. However, the given graph topology often contains noisy edges, and GNNs are vulnerable to noise in the graph structure. This issue remains unresolved. In this paper, we propose using adversarial robustness evaluation to select a small subset of robust nodes that are less affected by noise. We then only feed the features of these robust nodes, along with the KNN graph constructed from these nodes, into the GNN for classification. Additionally, we compute the centroids for each class. For the remaining non-robust nodes, we assign them to the class whose centroid is closest to them. Experimental results show that this method significantly improves the accuracy of GNNs.

CVNov 19, 2024
Accelerating UMAP for Large-Scale Datasets Through Spectral Coarsening

Yongyu Wang

This paper introduces an innovative approach to dramatically accelerate UMAP using spectral data compression.The proposed method significantly reduces the size of the dataset, preserving its essential manifold structure through an advanced spectral compression technique. This allows UMAP to perform much faster while maintaining the quality of its embeddings. Experiments on real-world datasets, such as USPS, demonstrate the method's ability to achieve substantial data reduction without compromising embedding fidelity.

LGJan 28, 2024
Mitigating the Impact of Noisy Edges on Graph-Based Algorithms via Adversarial Robustness Evaluation

Yongyu Wang, Xiaotian Zhuang

Given that no existing graph construction method can generate a perfect graph for a given dataset, graph-based algorithms are often affected by redundant and erroneous edges present within the constructed graphs. In this paper, we view these noisy edges as adversarial attack and propose to use a spectral adversarial robustness evaluation method to mitigate the impact of noisy edges on the performance of graph-based algorithms. Our method identifies the points that are less vulnerable to noisy edges and leverages only these robust points to perform graph-based algorithms. Our experiments demonstrate that our methodology is highly effective and outperforms state-of-the-art denoising methods by a large margin.

CVOct 25, 2021
Accelerate 3D Object Processing via Spectral Layout

Yongyu Wang

3D image processing is an important problem in computer vision and pattern recognition fields. Compared with 2D image processing, its computation difficulty and cost are much higher due to the extra dimension. To fundamentally address this problem, we propose to embed the essential information in a 3D object into 2D space via spectral layout. Specifically, we construct a 3D adjacency graph to capture spatial structure of the 3D voxel grid. Then we calculate the eigenvectors corresponding to the second and third smallest eigenvalues of its graph Laplacian and perform spectral layout to map each voxel into a pixel in 2D Cartesian coordinate plane. The proposed method can achieve high quality 2D representations for 3D objects, which enables to use 2D-based methods to process 3D objects. The experimental results demonstrate the effectiveness and efficiency of our method.

LGOct 24, 2021
Improving Spectral Clustering Using Spectrum-Preserving Node Aggregation

Yongyu Wang

Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.

LGNov 23, 2019
GRASPEL: Graph Spectral Learning at Scale

Yongyu Wang, Zhiqiang Zhao, Zhuo Feng

Learning meaningful graphs from data plays important roles in many data mining and machine learning tasks, such as data representation and analysis, dimension reduction, data clustering, and visualization, etc. In this work, for the first time, we present a highly-scalable spectral approach (GRASPEL) for learning large graphs from data. By limiting the precision matrix to be a graph Laplacian, our approach aims to estimate ultra-sparse (tree-like) weighted undirected graphs and shows a clear connection with the prior graphical Lasso method. By interleaving the latest high-performance nearly-linear time spectral methods for graph sparsification, coarsening and embedding, ultra-sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as spectral clustering (SC), and t-Distributed Stochastic Neighbor Embedding (t-SNE). {For example, when comparing with graphs constructed using existing methods, GRASPEL achieved the best spectral clustering efficiency and accuracy.

LGOct 6, 2019
GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding

Chenhui Deng, Zhiqiang Zhao, Yongyu Wang et al.

Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computational complexity and memory usage. In this paper we propose GraphZoom, a multi-level framework for improving both accuracy and scalability of unsupervised graph embedding algorithms. GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information. This fused graph is then repeatedly coarsened into much smaller graphs by merging nodes with high spectral similarities. GraphZoom allows any existing embedding methods to be applied to the coarsened graph, before it progressively refine the embeddings obtained at the coarsest level to increasingly finer graphs. We have evaluated our approach on a number of popular graph datasets for both transductive and inductive tasks. Our experiments show that GraphZoom can substantially increase the classification accuracy and significantly accelerate the entire graph embedding process by up to 40.8x, when compared to the state-of-the-art unsupervised embedding methods.

LGOct 12, 2017
Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Yongyu Wang, Zhuo Feng

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.