LGNov 21, 2025
Aggregating Direct and Indirect Neighbors through Graph Linear TransformationsMarshall Rosenhoover, Huaming Zhang
Graph neural networks (GNN) typically rely on localized message passing, requiring increasing depth to capture long range dependencies. In this work, we introduce Graph Linear Transformations, a linear transformation that realizes direct and indirect feature mixing on graphs through a single, well-defined linear operator derived from the graph structure. By interpreting graphs as walk-summable Gaussian graphical models, we compute these transformations via Gaussian Belief Propagation, enabling each node to aggregate information from both direct and indirect neighbors without explicit enumeration of multi-hop paths. We show that different constructions of the underlying precision matrix induce distinct and interpretable propagation biases, ranging from selective edge-level interactions to uniform structural smoothing, and that Graph Linear Transformations can achieve competitive or superior performance compared to both local message-passing GNNs and dynamic neighborhood aggregation models across homophilic and heterophilic benchmark datasets.
CLMar 10, 2025
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science AutomationZiming You, Yumiao Zhang, Dexuan Xu et al.
Existing large language model (LLM) agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We introduce DatawiseAgent, a notebook-centric LLM agent framework for adaptive and robust data science automation. Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture based on finite-state transducers (FSTs). This design enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures. Extensive experiments across diverse data science scenarios and models show that DatawiseAgent consistently achieves SOTA performance by surpassing strong baselines such as AutoGen and TaskWeaver, demonstrating superior effectiveness and adaptability. Further evaluations reveal graceful performance degradation under weaker or smaller models, underscoring the robustness and scalability.
SIFeb 22, 2021
Weighted Graph Nodes Clustering via Gumbel SoftmaxDeepak Bhaskar Acharya, Huaming Zhang
Graph is a ubiquitous data structure in data science that is widely applied in social networks, knowledge representation graphs, recommendation systems, etc. When given a graph dataset consisting of one graph or more graphs, where the graphs are weighted in general, the first step is often to find clusters in the graphs. In this paper, we present some ongoing research results on graph clustering algorithms for clustering weighted graph datasets, which we name as Weighted Graph Node Clustering via Gumbel Softmax (WGCGS for short). We apply WGCGS on the Karate club weighted network dataset. Our experiments demonstrate that WGCGS can efficiently and effectively find clusters in the Karate club weighted network dataset. Our algorithm's effectiveness is demonstrated by (1) comparing the clustering result obtained from our algorithm and the given labels of the dataset; and (2) comparing various metrics between our clustering algorithm and other state-of-the-art graph clustering algorithms.
LGMay 5, 2020
Community Detection Clustering via Gumbel SoftmaxDeepak Bhaskar Acharya, Huaming Zhang
Recently, in many systems such as speech recognition and visual processing, deep learning has been widely implemented. In this research, we are exploring the possibility of using deep learning in community detection among the graph datasets. Graphs have gained growing traction in different fields, including social networks, information graphs, the recommender system, and also life sciences. In this paper, we propose a method of community detection clustering the nodes of various graph datasets. We cluster different category datasets that belong to Affiliation networks, Animal networks, Human contact networks, Human social networks, Miscellaneous networks. The deep learning role in modeling the interaction between nodes in a network allows a revolution in the field of science relevant to graph network analysis. In this paper, we extend the gumbel softmax approach to graph network clustering. The experimental findings on specific graph datasets reveal that the new approach outperforms traditional clustering significantly, which strongly shows the efficacy of deep learning in graph community detection clustering. We do a series of experiments on our graph clustering algorithm, using various datasets: Zachary karate club, Highland Tribe, Train bombing, American Revolution, Dolphins, Zebra, Windsurfers, Les Misérables, Political books.
LGOct 23, 2019
Feature Selection and Extraction for Graph Neural NetworksDeepak Bhaskar Acharya, Huaming Zhang
Graph Neural Networks (GNNs) have been a latest hot research topic in data science, due to the fact that they use the ubiquitous data structure graphs as the underlying elements for constructing and training neural networks. In a GNN, each node has numerous features associated with it. The entire task (for example, classification, or clustering) utilizes the features of the nodes to make decisions, at node level or graph level. In this paper, (1) we extend the feature selection algorithm presented in via Gumbel Softmax to GNNs. We conduct a series of experiments on our feature selection algorithms, using various benchmark datasets: Cora, Citeseer and Pubmed. (2) We implement a mechanism to rank the extracted features. We demonstrate the effectiveness of our algorithms, for both feature selection and ranking. For the Cora dataset, (1) we use the algorithm to select 225 features out of 1433 features. Our experimental results demonstrate their effectiveness for the same classification problem. (2) We extract features such that they are linear combinations of the original features, where the coefficients for each extracted features are non-negative and sum up to one. We propose an algorithm to rank the extracted features in the sense that when using them for the same classification problem, the accuracy goes down gradually for the extracted features within the rank 1 - 50, 51 - 100, 100 - 150, and 151 - 200.