SEAILGMay 7, 2023

Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

arXiv:2305.04228v64 citations
Originality Incremental advance
AI Analysis

This work improves code classification for program understanding and automatic coding, though it appears incremental as it builds on existing AST and GNN techniques.

The authors tackled the problem of code classification by addressing the limitations of existing AST and GNN methods that ignore high-order correlations and structural details, proposing a heterogeneous directed hypergraph neural network (HDHGN) that outperforms previous methods on Python and Java datasets.

Code classification is a difficult issue in program understanding and automatic coding. Due to the elusive syntax and complicated semantics in programs, most existing studies use techniques based on abstract syntax tree (AST) and graph neural network (GNN) to create code representations for code classification. These techniques utilize the structure and semantic information of the code, but they only take into account pairwise associations and neglect the high-order data correlations that already exist between nodes of the same field or called attribute in the AST, which may result in the loss of code structural information. On the other hand, while a general hypergraph can encode high-order data correlations, it is homogeneous and undirected which will result in a lack of semantic and structural information such as node types, edge types, and directions between child nodes and parent nodes when modeling AST. In this study, we propose a heterogeneous directed hypergraph (HDHG) to represent AST and a heterogeneous directed hypergraph neural network (HDHGN) to process the graph for code classification. Our method improves code understanding and can represent high-order data correlations beyond paired interactions. We assess our heterogeneous directed hypergraph neural network (HDHGN) on public datasets of Python and Java programs. Our method outperforms previous AST-based and GNN-based methods, which demonstrates the capability of our model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes