LGJul 25, 2024
RIDA: A Robust Attack Framework on Incomplete GraphsJianke Yu, Hanchen Wang, Chen Chen et al.
Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These attacks exploit GNNs' need for retraining on updated data, thereby impacting their performance by perturbing these datasets. However, current research overlooks the real-world scenario of incomplete graphs. To address this gap, we introduce the Robust Incomplete Deep Attack Framework (RIDA). It is the first algorithm for robust gray-box poisoning attacks on incomplete graphs. The approach innovatively aggregates distant vertex information and ensures powerful data utilization. Extensive tests against 9 SOTA baselines on 3 real-world datasets demonstrate that RIDA's superiority in handling incompleteness and high attack performance on the incomplete graph.
AIMay 18
EXG: Self-Evolving Agents with Experience GraphsYuxin Jin, Siyuan Zhang, Hanchen Wang et al.
Large language model (LLM)-based agents have demonstrated strong capabilities in complex reasoning and problem solving through multi-step interactions, yet most deployed agents remain behaviorally static, with knowledge acquired during execution rarely translating into systematic improvement over time. In response, a growing line of work on self-evolving agents explores how agents can improve through experience during deployment, but most existing approaches either rely on ad hoc reflection limited to single-task correction or adopt unstructured memory that accumulates fragmented experience with delayed usability. To address this limitation, we introduce EXG, an experience graph framework for self-evolving agents that explicitly organizes accumulated successes and failures into a structured, relational representation. EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses. Extensive experiments across code generation and reasoning benchmarks show that EXG attains more favorable performance-efficiency trade-offs than reflection- and memory-based baselines in both online and offline evaluations. Our results suggest that structuring experience as a graph provides a principled foundation for scalable and transferable self-evolving agent behavior.
LGMay 12, 2020Code
GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity InteractionsHanchen Wang, Defu Lian, Ying Zhang et al.
Entity interaction prediction is essential in many important applications such as chemistry, biology, material science, and medical science. The problem becomes quite challenging when each entity is represented by a complex structure, namely structured entity, because two types of graphs are involved: local graphs for structured entities and a global graph to capture the interactions between structured entities. We observe that existing works on structured entity interaction prediction cannot properly exploit the unique graph of graphs model. In this paper, we propose a Graph of Graphs Neural Network, namely GoGNN, which extracts the features in both structured entity graphs and the entity interaction graph in a hierarchical way. We also propose the dual-attention mechanism that enables the model to preserve the neighbor importance in both levels of graphs. Extensive experiments on real-world datasets show that GoGNN outperforms the state-of-the-art methods on two representative structured entity interaction prediction tasks: chemical-chemical interaction prediction and drug-drug interaction prediction. Our code is available at Github.
IRAug 14, 2024
Towards Fair and Rigorous Evaluations: Hyperparameter Optimization for Top-N Recommendation Task with Implicit FeedbackHui Fang, Xu Feng, Lu Qin et al.
The widespread use of the internet has led to an overwhelming amount of data, which has resulted in the problem of information overload. Recommender systems have emerged as a solution to this problem by providing personalized recommendations to users based on their preferences and historical data. However, as recommendation models become increasingly complex, finding the best hyperparameter combination for different models has become a challenge. The high-dimensional hyperparameter search space poses numerous challenges for researchers, and failure to disclose hyperparameter settings may impede the reproducibility of research results. In this paper, we investigate the Top-N implicit recommendation problem and focus on optimizing the benchmark recommendation algorithm commonly used in comparative experiments using hyperparameter optimization algorithms. We propose a research methodology that follows the principles of a fair comparison, employing seven types of hyperparameter search algorithms to fine-tune six common recommendation algorithms on three datasets. We have identified the most suitable hyperparameter search algorithms for various recommendation algorithms on different types of datasets as a reference for later study. This study contributes to algorithmic research in recommender systems based on hyperparameter optimization, providing a fair basis for comparison.
LGSep 2, 2025
HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware AnalysisHan Chen, Hanchen Wang, Hongmei Chen et al.
The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested within \textbf{595K} Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.
LGJan 25, 2022
Reinforcement Learning Based Query Vertex Ordering Model for Subgraph MatchingHanchen Wang, Ying Zhang, Lu Qin et al.
Subgraph matching is a fundamental problem in various fields that use graph structured data. Subgraph matching algorithms enumerate all isomorphic embeddings of a query graph q in a data graph G. An important branch of matching algorithms exploit the backtracking search approach which recursively extends intermediate results following a matching order of query vertices. It has been shown that the matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In recent years, many advanced techniques for query vertex ordering (i.e., matching order generation) have been proposed to reduce the unpromising intermediate results according to the preset heuristic rules. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms. Instead of using the fixed heuristics to generate the matching order, our model could capture and make full use of the graph information, and thus determine the query vertex order with the adaptive learning-based rule that could significantly reduces the number of redundant enumerations. With the help of the reinforcement learning framework, our model is able to consider the long-term benefits rather than only consider the local information at current ordering step.Extensive experiments on six real-life data graphs demonstrate that our proposed matching order generation technique could reduce up to two orders of magnitude of query processing time compared to the state-of-the-art algorithms.
CRJan 17, 2022
Correlation Cube Attack Revisited: Improved Cube Search and Superpoly Recovery TechniquesJianhua Wang, Lu Qin, Baofeng Wu
In this paper, we improve the cube attack by exploiting low-degree factors of the superpoly w.r.t. certain "special" index set of cube (ISoC). This can be viewed as a special case of the correlation cube attack proposed at Eurocrypt 2018, but under our framework more beneficial equations on the key variables can be obtained in the key-recovery phase. To mount our attack, one has two challenging problems: effectively recover algebraic normal form of the superpoly and extract out its low-degree factors; and efficiently search a large quantity of good ISoCs. We bring in new techniques to solve both of them. First, we propose the variable substitution technique for middle rounds of a cipher, in which polynomials on the key variables in the algebraic expressions of internal states are substituted by new variables. This will improve computational complexity of the superpoly recovery and promise more compact superpolys that can be easily decomposed with respect to the new variables. Second, we propose the vector numeric mapping technique, which seeks out a tradeoff between efficiency of the numeric mapping technique and accuracy of the monomial prediction technique in degree evaluation of superpolys. Combining with this technique, a fast pruning method is given and modeled by MILP to filter good ISoCs of which the algebraic degree satisfies some fixed threshold. Thanks to automated MILP solvers, it becomes practical to comprehensively search for good cubes across the entire search space. To illustrate the power of our techniques, we apply all of them to Trivium stream cipher. The previous best practical key recovery attack was on 820-round Trivium with complexity $2^{53.17}$. We put forward 820-, 825- and 830-round practical key-recovery attacks, in which there are 2^{80}\times 87.8%, 2^{80}\times 83% and 2^{80}\times 65.7% keys that could be practically recovered, respectively.
CLApr 20, 2020
Taming the Expressiveness and Programmability of Graph Analytical QueriesLu Qin, Longbin Lai, Kongzhang Hao et al.
Graph database has enjoyed a boom in the last decade, and graph queries accordingly gain a lot of attentions from both the academia and industry. We focus on analytical queries in this paper. While analyzing existing domain-specific languages (DSLs) for analytical queries regarding the perspectives of completeness, expressiveness and programmability, we find out that none of existing work has achieved a satisfactory coverage of these perspectives. Motivated by this, we propose the \flash DSL, which is named after the three primitive operators Filter, LocAl and PuSH. We prove that \flash is Turing complete (completeness), and show that it achieves both good expressiveness and programmability for analytical queries. We provide an implementation of \flash based on code generation, and compare it with native C++ codes and existing DSL using representative queries. The experiment results demonstrate \flash's expressiveness, and its capability of programming complex algorithms that achieve satisfactory runtime.
LGApr 19, 2020
Binarized Graph Neural NetworkHanchen Wang, Defu Lian, Ying Zhang et al.
Recently, there have been some breakthroughs in graph analysis by applying the graph neural networks (GNNs) following a neighborhood aggregation scheme, which demonstrate outstanding performance in many tasks. However, we observe that the parameters of the network and the embedding of nodes are represented in real-valued matrices in existing GNN-based graph embedding approaches which may limit the efficiency and scalability of these models. It is well-known that binary vector is usually much more space and time efficient than the real-valued vector. This motivates us to develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters following the GNN-based paradigm. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches to binarize the model parameters and learn the compact embedding. Extensive experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space while matching the state-of-the-art performance.